[GE users] GRAM & Arco

Daniel Templeton Dan.Templeton at Sun.COM
Thu Jun 28 18:49:21 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Sam,

You want to use drmaa_wait() with DRMAA_JOB_IDS_SESSION_ANY as the job 
id.  drmaa_wait() will then return whenever any job completes.  The 
job_id_out parameter is there so that in this case you can tell which 
job ended.

The *_methods as callbacks is an interesting idea.  That in combination 
with drmaa_wait() would get all state changes other than hold and 
release, which probably aren't relevant.

This is one other alternative.  If you're not afraid of getting your 
hands dirty, it's not very difficult to write an event client for Grid 
Engine.  You can use DRMAA/JAPI and qevent as examples.  Once you've 
registered as an event client, you can get asynchronous notifications 
for whatever events you want.

Daniel

Samuel Meder wrote:
>
> On Jun 28, 2007, at 10:16 AM, Daniel Templeton wrote:
>
>> Sam,
>>
>> DRMAA does support asynchronous notification, but only for job finish 
>> events.
>
> Do you off-hand remember the functions for that? The only ones I could 
> find in this area are
> drmaa_synchronize() and drmaa_wait() and neither one of these really 
> provide usable semantics for asynchronous monitoring of job finish 
> events. One would really want something similar to drmaa_synchronize 
> that works similar to select, i.e. returns if when any of the jobs 
> passed to it finishes (rather than all).
>
>> Extending that for other events is something that has been on the 
>> table for the next version of the standard.
>>
>
> Ok, in the meantime we're considering working around this issue by 
> utilizing the starter_method, suspend_method, resume_method and 
> terminate_method parameters to insert a callback into the grid-engine 
> execution path. Any thoughts on that? Are there any better solutions 
> you can think of?
>
> Thanks for the help.
>
> /Sam
>
>> Daniel
>>
>> Samuel Meder wrote:
>>>
>>> On Jun 28, 2007, at 7:22 AM, Daniel Templeton wrote:
>>>
>>>> Sam,
>>>>
>>>> Why in the world is your GRAM adapter using the reporting file?  
>>>> (Which GRAM adapter are you using, by the way?)  The whole purpose 
>>>> of the reporting file is to be destructively consumed by ARCo.  It 
>>>> was not foreseen that anything other than ARCo would use the 
>>>> reporting file, so there's no built-in settings to make ARCo leave 
>>>> the file intact.
>>>>
>>>
>>> We're using the GRAM adapter found here:
>>>
>>> http://www.lesc.ic.ac.uk/projects/SGE-GT4.html
>>>
>>> This adapter uses the reporting file to implement the SEG (scheduler 
>>> event generator) module for SGE. The SEG module essentially 
>>> continuously reads the reporting file and pushes job related events 
>>> to GRAM (e.g. pending, running, etc.).
>>>
>>>> I'm still astounded that no one has built a GRAM adapter based on 
>>>> DRMAA yet.  It seems like it would be the simplest and most 
>>>> portable thing to do.  Maybe I should add it to my to-do list.
>>>>
>>>
>>> I took a quick look at the DRMAA C bindings and as far as I can tell 
>>> the biggest issue with using DRMAA to implement this functionality 
>>> is that the DRMAA API does not seem to provide any asynchronous 
>>> notification mechanism for job state changes (correct me if I am 
>>> wrong, but as far as I could tell all the job state monitoring 
>>> functions look like they are polling). Are there any lower level SGE 
>>> APIs that provide job state notifications by any chance?
>>>
>>> /Sam
>>>
>>>> Daniel
>>>>
>>>> Samuel Meder wrote:
>>>>>
>>>>> We've found a problem trying to utilize the current GRAM-SGE 
>>>>> adapter together with SGE Arco: Arco's dbvwriter component 
>>>>> destructively consumes the logging (reporting) information that is 
>>>>> used by the GRAM event generator adapter for SGE with the 
>>>>> consequence that jobs can be submitted via gram, but their state 
>>>>> will never be updated.
>>>>>
>>>>> Is there any way around this? This could be dealt with by either 
>>>>> enabling reporting to multiple files or by consuming the data in a 
>>>>> non-destructive fashion, but I didn't see any configuration 
>>>>> options for doing so.
>>>>>
>>>>> Any help would be appreciated.
>>>>>
>>>>> /Sam
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list