[GE users] drmaa return value for getJobProgramStatus

Daniel Templeton Dan.Templeton at Sun.COM
Sat Jun 30 21:20:37 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Actually, I guess there is a third option.  In the DRMAA Hands-on Lab I 
did for JavaOne, I just had the polling thread stop asking about a job 
once it got the InvalidJobException, with the assumption that the job 
was now in the hands of the wait thread.

Daniel

Daniel Templeton wrote:
> Gah!  My apologies.  Please ignore my previous two emails.  That's 
> what I get for writing emails at midnight.  I reread my emails this 
> morning, and I don't know what I was thinking. :(
>
> The implementation *does* already do what I was suggesting.  It uses 
> the local job info cache to return the job state if the qmaster 
> doesn't remember the job.  The reason why you're seeing the 
> InvalidJobException is that you also have a thread doing a wait(ANY) 
> call in a loop.  (Am I right?)  Once a wait() call has succeeded for a 
> job, that job no longer exists.  Period.
>
> There are two ways to deal with the problem.  Either have the wait 
> thread notify the polling thread once a job has ended, or build the 
> wait() call into the polling thread after a job's state is FINISHED or 
> ERROR.
>
> Sorry for the confusion.
>
> Daniel
>
> Ryan Golhar wrote:
>> Thanks Daniel.  It seems a bit odd that there is a Session.DONE but 
>> it will
>> never get used.  If the DRMAA implementation does have information, 
>> will it
>> always work or is it just because of the session instance?  If the DRMAA
>> implementation will always have information on completed jobs, then 
>> it makes
>> sense to use that information, but if its not guaranteed, then I 
>> don't know
>> if that is the best solution (in my opinion).  In either case, I 
>> think it
>> would be good to file it as an RFE.  How do I do that?
>>
>> Ryan
>>
>>
>> -----Original Message-----
>> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] Sent: 
>> Saturday, June 30, 2007 3:11 AM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] drmaa return value for getJobProgramStatus
>>
>>
>> Ryan,
>>
>> The exception happens because the qmaster disavows all knowledge of 
>> finished jobs.  (Not exactly, but close enough for this discussion.)  
>> Since the DRMAA implementation actually does have the information 
>> about the job on hand, though, it really would make sense for the 
>> getJobProgramStatus() method to use that information in the case of 
>> finished jobs instead of only relying on the qmaster.  If you'd like 
>> to file that as an RFE, that would be helpful.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Templeton wrote:
>>  
>>> Ryan,
>>>
>>> That is indeed how the implementation works.  To confirm that the
>>> InvalidJobException from getJobProgramStatus() means that the job 
>>> has ended, wait() for the job with the timeout set to 
>>> Session.TIMEOUT_NO_WAIT.  If the job has finished, the wait() call 
>>> will return its exit info, including why/how it exited.  If the job 
>>> simply doesn't exist for some reason, you'll get another 
>>> InvalidJobException.
>>>
>>> Daniel
>>>
>>> Ryan Golhar wrote:
>>>    
>>>> I'm able to successfully submit a job through Drmaa to the
>>>> appropriate queue
>>>> and set other settings.  If the job is running and I call
>>>> getJobProgramStatus (Java), I get a return value of Session.Running 
>>>> (32)
>>>> which is correct.  Once the job completes, and I call 
>>>> getJobProgramStatus, I
>>>> get an exception about the job id not being valid:
>>>>
>>>> org.ggf.drmaa.InvalidJobException: The job specified by the 'jobid'
>>>> does not
>>>> exist.
>>>>         at 
>>>> com.sun.grid.drmaa.SessionImpl.nativeGetJobProgramStatus(Native
>>>> Method)
>>>>         at
>>>> com.sun.grid.drmaa.SessionImpl.getJobProgramStatus(SessionImpl.java:213) 
>>>>
>>>>         at 
>>>> org.umdnj.JBLAST.LocalSGEBLAST.exeGet(LocalSGEBLAST.java:82)
>>>>         at 
>>>> org.umdnj.JBLAST.BlastResultThread.run(BlastResultThread.java:62)
>>>>
>>>> I can interpret this exception as the job has completed, however I 
>>>> don't think this is the correct way of doing things as I can't tell if
>>>>       
>> the job
>>  
>>>> complete successfully or if something else happened.   Am I missing
>>>> something?
>>>> Ryan
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>     
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list