[GE users] drmaa return value for getJobProgramStatus

Ryan Golhar golharam at umdnj.edu
Tue Jul 3 18:40:21 BST 2007


Dan,

I'm still stuck.  I've stripped down my code to the basics to test this:

1.  I initialized the grid engine
2.  I submit the job and get the jobid
3.  I poll the status of the jobid using getJobProgramStatus.  The first
time I call getJobProgramStatus, it returns QUEUED_ACTIVE.  

If I call Session.wait(..) immediately after, I get an Exception that the
job does not exist.  However qstat reports the job is queued.  

If I wait for a few seconds, I see the job starts running (using qstat).
The program called getJobProgramStatus which returns RUNNING.  I then call
Session.wait and get the same exception.

I'm not sure what is wrong.  If the program status is RUNNING, then I would
expect Session.wait to work.  I've attached the java code I'm using.  Any
help would be appreciated.  Thanks,

Ryan


-----Original Message-----
From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] 
Sent: Saturday, June 30, 2007 4:21 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] drmaa return value for getJobProgramStatus


Actually, I guess there is a third option.  In the DRMAA Hands-on Lab I 
did for JavaOne, I just had the polling thread stop asking about a job 
once it got the InvalidJobException, with the assumption that the job 
was now in the hands of the wait thread.

Daniel

Daniel Templeton wrote:
> Gah!  My apologies.  Please ignore my previous two emails.  That's
> what I get for writing emails at midnight.  I reread my emails this 
> morning, and I don't know what I was thinking. :(
>
> The implementation *does* already do what I was suggesting.  It uses
> the local job info cache to return the job state if the qmaster 
> doesn't remember the job.  The reason why you're seeing the 
> InvalidJobException is that you also have a thread doing a wait(ANY) 
> call in a loop.  (Am I right?)  Once a wait() call has succeeded for a 
> job, that job no longer exists.  Period.
>
> There are two ways to deal with the problem.  Either have the wait
> thread notify the polling thread once a job has ended, or build the 
> wait() call into the polling thread after a job's state is FINISHED or 
> ERROR.
>
> Sorry for the confusion.
>
> Daniel
>
> Ryan Golhar wrote:
>> Thanks Daniel.  It seems a bit odd that there is a Session.DONE but
>> it will
>> never get used.  If the DRMAA implementation does have information, 
>> will it
>> always work or is it just because of the session instance?  If the DRMAA
>> implementation will always have information on completed jobs, then 
>> it makes
>> sense to use that information, but if its not guaranteed, then I 
>> don't know
>> if that is the best solution (in my opinion).  In either case, I 
>> think it
>> would be good to file it as an RFE.  How do I do that?
>>
>> Ryan
>>
>>
>> -----Original Message-----
>> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] Sent:
>> Saturday, June 30, 2007 3:11 AM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] drmaa return value for getJobProgramStatus
>>
>>
>> Ryan,
>>
>> The exception happens because the qmaster disavows all knowledge of
>> finished jobs.  (Not exactly, but close enough for this discussion.)  
>> Since the DRMAA implementation actually does have the information 
>> about the job on hand, though, it really would make sense for the 
>> getJobProgramStatus() method to use that information in the case of 
>> finished jobs instead of only relying on the qmaster.  If you'd like 
>> to file that as an RFE, that would be helpful.
>>
>> Thanks,
>> Daniel
>>
>> Daniel Templeton wrote:
>>  
>>> Ryan,
>>>
>>> That is indeed how the implementation works.  To confirm that the 
>>> InvalidJobException from getJobProgramStatus() means that the job 
>>> has ended, wait() for the job with the timeout set to 
>>> Session.TIMEOUT_NO_WAIT.  If the job has finished, the wait() call 
>>> will return its exit info, including why/how it exited.  If the job 
>>> simply doesn't exist for some reason, you'll get another 
>>> InvalidJobException.
>>>
>>> Daniel
>>>
>>> Ryan Golhar wrote:
>>>    
>>>> I'm able to successfully submit a job through Drmaa to the 
>>>> appropriate queue and set other settings.  If the job is running 
>>>> and I call getJobProgramStatus (Java), I get a return value of 
>>>> Session.Running
>>>> (32)
>>>> which is correct.  Once the job completes, and I call
>>>> getJobProgramStatus, I
>>>> get an exception about the job id not being valid:
>>>>
>>>> org.ggf.drmaa.InvalidJobException: The job specified by the 'jobid' 
>>>> does not exist.
>>>>         at 
>>>> com.sun.grid.drmaa.SessionImpl.nativeGetJobProgramStatus(Native
>>>> Method)
>>>>         at
>>>>
com.sun.grid.drmaa.SessionImpl.getJobProgramStatus(SessionImpl.java:213) 
>>>>
>>>>         at
>>>> org.umdnj.JBLAST.LocalSGEBLAST.exeGet(LocalSGEBLAST.java:82)
>>>>         at 
>>>> org.umdnj.JBLAST.BlastResultThread.run(BlastResultThread.java:62)
>>>>
>>>> I can interpret this exception as the job has completed, however I
>>>> don't think this is the correct way of doing things as I can't tell if
>>>>       
>> the job
>>  
>>>> complete successfully or if something else happened.   Am I missing
>>>> something?
>>>> Ryan
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>         
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>     
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list