[GE users] Job submission speed through DRMAA api.

Andreas Haas Andreas.Haas at Sun.COM
Fri Jan 21 10:03:57 GMT 2005


Irrespective of possible inefficiencies in implementation you
could add in case of DRMAA synchronization at session begin is
always necessary to guarantee drmaa_wait() reliably works. For
that reason a drmaa_init() delay is unavoidable even though
I fully agree there should possibilities to lessen drmaa_init()
delay.

Though I don't know the overall set-up used in that case but as
long as the python script issues a number of drmaa_run_job()
calls the session init delay should be negligible. Also the
drmaa_init() overhead performance-wise will yield a sensible
profit as soon as DRMAA capability to synchronize with jobs
finish is utilized compared to qstat/qacct be used for that
purpose.

HTH,
Andreas

On Fri, 21 Jan 2005, Daniel Templeton wrote:

> Duh.  I should have known that.  I wrote it. ;)
>
> The reason for the difference is that qsub is single-threaded unless a
> -sync or -now option is used.  DRMAA is always mutli-threaded.  At the
> creation of the second thread, the second thread requests a current list
> of running jobs from the qmaster.  It and the main thread block until
> that list is received.
>
> OK, so we've accounted for 2 of 3.5 seconds.  The other 1.5 must be in
> the job templates and the drmaa_run_job() call.  The next interesting
> test would be to time how long the drmaa_run_job() call takes.  I
> suspect it's less than 0.1 seconds.
>
> Daniel
>
> Ron Chen wrote:
>
> >I looked into this problem today, here's what I found:
> >
> >qsub and drmaa_init() both eventually call
> >japi_init(), but drmaa_init() takes 2 seconds:
> >
> >% time ./a.out
> >0.000u 0.002s 0:01.96 0.0%      0+0k 0+0io 0pf+0w
> >
> >=======================================================
> >#include "drmaa.h"
> >
> >int main (int argc, char **argv) {
> >   char error[DRMAA_ERROR_STRING_BUFFER];
> >   drmaa_init (NULL, error,
> >DRMAA_ERROR_STRING_BUFFER);
> >}
> >=======================================================
> >
> >I looked into why japi_init() can make such a
> >difference, and found that for the parameter "bool
> >enable_wait", qsub is false, while drmaa_init() is
> >true.
> >
> >And if I change enable_wait to false, it becomes:
> >
> >% time ./a.out
> >0.000u 0.001s 0:00.00 0.0%      0+0k 0+0io 0pf+0w
> >
> >But after I read that enable_wait is needed to allow
> >japi_wait() and japi_synchronize() to function.
> >
> >And looking into why it tooks so much time waiting, I
> >turned on tracing:
> >
> > --> gdi_receive_sec_message() {
> > <-- gdi_receive_sec_message()
> >../libs/gdi/sge_security.c 190 }
> > --> sge_log() {  <- **most of the wait time is here**
> >    ../libs/evc/sge_event_client.c 2695 commlib
> >returns got no message
> > <-- get_event_list() ../libs/evc/sge_event_client.c
> >2713 }
> > --> sge_get_qmaster_port() {
> >
> >Stack trace:
> > get_event_list()
> > ec_get()
> > japi_implementation_thread()
> > start_thread ()
> >
> >So commlib was waiting for qmaster to return data but
> >found none?
> >
> > -Ron
> >
> >
> >--- Daniel Templeton <Dan.Templeton at Sun.COM> wrote:
> >
> >
> >>That's interesting.  I don't see why a mutex lock
> >>would be expensive
> >>when there's no contention...  Something else to
> >>look into when I have time.
> >>
> >>Daniel
> >>
> >>Fred L Youhanaie wrote:
> >>
> >>
> >>
> >>>Hi,
> >>>
> >>>Here is something that I've noticed, in case it is
> >>>
> >>>
> >>helpful.
> >>
> >>
> >>>A few weeks ago I was playing with the DRMAA C API
> >>>
> >>>
> >>on my laptop and I
> >>
> >>
> >>>did notice a 2 second delay when running the
> >>>
> >>>
> >>simple init/exit example
> >>
> >>
> >>>from
> >>>
> >>>
> >http://gridengine.sunsource.net/project/gridengine/howto/drmaa.html
> >
> >
> >>>When run via strace, most of the delay was in the
> >>>
> >>>
> >>two futex syscalls
> >>
> >>
> >>>called from drmaa_init. I have SGE 6.0u1, FC2,
> >>>
> >>>
> >>2.6.9, no bdb, qmaster
> >>
> >>
> >>>and execd are all on the same host.
> >>>
> >>>HTH
> >>>
> >>>Cheers
> >>>f.
> >>>
> >>>
> >>>Kevin Ruland wrote:
> >>>
> >>>
> >>>
> >>>>All,
> >>>>
> >>>>My situation is a little more complex than I
> >>>>
> >>>>
> >>first let on. First I'm
> >>
> >>
> >>>>not using DRMAA directly, but rather through
> >>>>
> >>>>
> >>python swig wrappers.  So
> >>
> >>
> >>>>I need to figure out what I'm doing in there,
> >>>>
> >>>>
> >>recode in straight C and
> >>
> >>
> >>>>see if there is any significant difference.  So I
> >>>>
> >>>>
> >>don't have a nice
> >>
> >>
> >>>>test case yet.
> >>>>
> >>>>I also don't have hard numbers.  It's all a
> >>>>
> >>>>
> >>feeling now.
> >>
> >>
> >>>>Kevin
> >>>>
> >>>>Ron Chen wrote:
> >>>>
> >>>>
> >>>>
> >>>>>Kevin,
> >>>>>
> >>>>>Do you have a testcase?
> >>>>>
> >>>>>
> >>>
> >>>
> >>>
> >>>
> >---------------------------------------------------------------------
> >
> >
> >>>To unsubscribe, e-mail:
> >>>
> >>>
> >>users-unsubscribe at gridengine.sunsource.net
> >>
> >>
> >>>For additional commands, e-mail:
> >>>
> >>>
> >>users-help at gridengine.sunsource.net
> >>
> >>
> >>
> >>
> >>
> >---------------------------------------------------------------------
> >
> >
> >>To unsubscribe, e-mail:
> >>users-unsubscribe at gridengine.sunsource.net
> >>For additional commands, e-mail:
> >>users-help at gridengine.sunsource.net
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >__________________________________
> >Do you Yahoo!?
> >Meet the all-new My Yahoo! - Try it today!
> >http://my.yahoo.com
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> >
>
> --
> ***************************************************
> *        Daniel Templeton   ERGB01 x60220         *
> *       Staff Engineer, Sun N1 Grid Engine        *
> ***************************************************
> * "Roads? Where we're going we don't need roads." *
> *                    -Dr. Emmett Brown            *
> *                     Back to the Future (1985)   *
> ***************************************************
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list