[GE users] Monitor/control arbitrary session jobs via DRMAA

Andreas Haas Andreas.Haas at Sun.COM
Mon Mar 7 15:11:36 GMT 2005


It would be interesting to know if that patch could be used
to allow DRMAA/JAPI be optionally used in a way that
drmaa_wait(), drmaa_synchronize() and drmaa_ps_job() can be
used for arbitrary jobs. Very much likewise with qevent it would
require any job state transition event be delivered to the DRMAA
event client to fully cover jobs of arbitratry sessions. With the
current 6.0 implementation only those jobs state transition events
are propagated to DRMAA C impl that actually impact jobs of the
session initiated with that drmaa_init() call. That limitation
was done to limit data transfer from qmaster to DRMAA with
scalability considerations in mind.

DRMAA spec though does not prescribe support for arbitrary jobs but
definitely allows it. Actually the DRMAA spec even says it "SHOULD"
(RFC 2119) be supported. Due to the disadvantageous impact on
scalabitity it would be desirable to preserve single session job
control/monitoring behaviour as default but allow arbitrary jobs
be controled/monitored in case the session was set up in a special
way. Possibly the drmaa_init() contact string could be used to
control at session initialization time whether arbitrary sessions
jobs should be covered or not.

Please reply to dev@ mailing list.

Cheers,
Andreas

On Thu, 3 Mar 2005, Ron Chen wrote:

> Even 200 clients is a problem, it can still be
> troublesome if there are several thousand sync jobs.
>
> Since usually users want to wait for all their
> submitted jobs to finish, we can further extent qevent
> (how about call it "qwait"?), to refresh memory, this
> is what I did in 2003:
>
> http://gridengine.sunsource.net/servlets/ReadMsg?msgId=10204&listName=dev
>
> So users submit their jobs with qsub as usual, but
> they collect all the job IDs. Then with the extented
> qevent, it waits for all the job IDs.
>
> Example:
> % qsub sleep
> Your job 11 ("sleep") has been submitted.
>
> % qsub sleep
> Your job 12 ("sleep") has been submitted.
>
> % qevent -wait 11 12
> Job 11 finished
> Job 12 finished
>
> And it can be more powerful if it supports timeout,
> and if it timeouted, exit with an error.
>
> (The patch I posted has a race condition -- if a job
> already finished before we invoke "qevent -wait", then
> qevent will wait forever. So we need to subscribe to
> qmaster events, and then get a list of jobs from the
> qmaster, and only wait for the running/pending jobs.)
>
> Anyone interested in using this "qevent -wait"
> feature?
>
> :)
>
>  -Ron
>
> --- Stephan Grell - Sun Germany - SSG - Software
> Engineer <stephan.grell at sun.com> wrote:
> > we introduced the limit for two reasons (mainly the
> > first one):
> > 1) file descriptor limit
> > 2) possible drag down the qmaster
> >
> > I have not seen a hugh impact on the qmaster
> > performance due to
> > large number of event clients. But than, I have
> > never done performance
> > testing with more than 200 clients.
> >
> > Cheers,
> > Stephan
> >
> > Rayson Ho wrote:
> >
> > >>If you are using -sync or -now, qsub is registered
> > with the qmaster as
> > >>an event client.  Because, as Charu said, event
> > clients can drag down
> > >>the qmaster, there is a configurable limit to the
> > number of them allowed
> > >>at a time, which defaults to 99.
> > >>
> > >>
> > >
> > >Hi Daniel,
> > >
> > >Will there be a "slave" event master??
> > >
> > >Rayson
> > >
> > >
> > >
> > >
> > >>Daniel
> > >>
> > >>
>
>
>
>
> __________________________________
> Celebrate Yahoo!'s 10th Birthday!
> Yahoo! Netrospective: 100 Moments of the Web
> http://birthday.yahoo.com/netrospective/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list