[GE users] drmaa return value for getJobProgramStatus

Daniel Templeton Dan.Templeton at Sun.COM
Sat Jun 30 21:17:56 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Gah!  My apologies.  Please ignore my previous two emails.  That's what 
I get for writing emails at midnight.  I reread my emails this morning, 
and I don't know what I was thinking. :(

The implementation *does* already do what I was suggesting.  It uses the 
local job info cache to return the job state if the qmaster doesn't 
remember the job.  The reason why you're seeing the InvalidJobException 
is that you also have a thread doing a wait(ANY) call in a loop.  (Am I 
right?)  Once a wait() call has succeeded for a job, that job no longer 
exists.  Period.

There are two ways to deal with the problem.  Either have the wait 
thread notify the polling thread once a job has ended, or build the 
wait() call into the polling thread after a job's state is FINISHED or 
ERROR.

Sorry for the confusion.

Daniel

Ryan Golhar wrote:
> Thanks Daniel.  It seems a bit odd that there is a Session.DONE but it will
> never get used.  If the DRMAA implementation does have information, will it
> always work or is it just because of the session instance?  If the DRMAA
> implementation will always have information on completed jobs, then it makes
> sense to use that information, but if its not guaranteed, then I don't know
> if that is the best solution (in my opinion).  In either case, I think it
> would be good to file it as an RFE.  How do I do that?
>
> Ryan
>
>
> -----Original Message-----
> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] 
> Sent: Saturday, June 30, 2007 3:11 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] drmaa return value for getJobProgramStatus
>
>
> Ryan,
>
> The exception happens because the qmaster disavows all knowledge of 
> finished jobs.  (Not exactly, but close enough for this discussion.)  
> Since the DRMAA implementation actually does have the information about 
> the job on hand, though, it really would make sense for the 
> getJobProgramStatus() method to use that information in the case of 
> finished jobs instead of only relying on the qmaster.  If you'd like to 
> file that as an RFE, that would be helpful.
>
> Thanks,
> Daniel
>
> Daniel Templeton wrote:
>   
>> Ryan,
>>
>> That is indeed how the implementation works.  To confirm that the
>> InvalidJobException from getJobProgramStatus() means that the job has 
>> ended, wait() for the job with the timeout set to 
>> Session.TIMEOUT_NO_WAIT.  If the job has finished, the wait() call 
>> will return its exit info, including why/how it exited.  If the job 
>> simply doesn't exist for some reason, you'll get another 
>> InvalidJobException.
>>
>> Daniel
>>
>> Ryan Golhar wrote:
>>     
>>> I'm able to successfully submit a job through Drmaa to the
>>> appropriate queue
>>> and set other settings.  If the job is running and I call
>>> getJobProgramStatus (Java), I get a return value of Session.Running (32)
>>> which is correct.  Once the job completes, and I call 
>>> getJobProgramStatus, I
>>> get an exception about the job id not being valid:
>>>
>>> org.ggf.drmaa.InvalidJobException: The job specified by the 'jobid'
>>> does not
>>> exist.
>>>         at 
>>> com.sun.grid.drmaa.SessionImpl.nativeGetJobProgramStatus(Native
>>> Method)
>>>         at
>>> com.sun.grid.drmaa.SessionImpl.getJobProgramStatus(SessionImpl.java:213)
>>>         at org.umdnj.JBLAST.LocalSGEBLAST.exeGet(LocalSGEBLAST.java:82)
>>>         at 
>>> org.umdnj.JBLAST.BlastResultThread.run(BlastResultThread.java:62)
>>>
>>> I can interpret this exception as the job has completed, however I 
>>> don't think this is the correct way of doing things as I can't tell if
>>>       
> the job
>   
>>> complete successfully or if something else happened.   Am I missing
>>> something?
>>> Ryan
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>   
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list