[GE users] Job submission speed through DRMAA api.

Ron Chen ron_chen_123 at yahoo.com
Thu Jan 20 14:48:49 GMT 2005


I looked into this problem today, here's what I found:

qsub and drmaa_init() both eventually call
japi_init(), but drmaa_init() takes 2 seconds:

% time ./a.out
0.000u 0.002s 0:01.96 0.0%      0+0k 0+0io 0pf+0w

=======================================================
#include "drmaa.h"

int main (int argc, char **argv) {
   char error[DRMAA_ERROR_STRING_BUFFER];
   drmaa_init (NULL, error,
DRMAA_ERROR_STRING_BUFFER);
}
=======================================================

I looked into why japi_init() can make such a
difference, and found that for the parameter "bool
enable_wait", qsub is false, while drmaa_init() is
true.

And if I change enable_wait to false, it becomes:

% time ./a.out
0.000u 0.001s 0:00.00 0.0%      0+0k 0+0io 0pf+0w

But after I read that enable_wait is needed to allow
japi_wait() and japi_synchronize() to function.

And looking into why it tooks so much time waiting, I
turned on tracing:

 --> gdi_receive_sec_message() {
 <-- gdi_receive_sec_message()
../libs/gdi/sge_security.c 190 }
 --> sge_log() {  <- **most of the wait time is here**
    ../libs/evc/sge_event_client.c 2695 commlib
returns got no message
 <-- get_event_list() ../libs/evc/sge_event_client.c
2713 }
 --> sge_get_qmaster_port() {

Stack trace:
 get_event_list()
 ec_get()
 japi_implementation_thread()
 start_thread ()

So commlib was waiting for qmaster to return data but
found none?

 -Ron


--- Daniel Templeton <Dan.Templeton at Sun.COM> wrote:
> That's interesting.  I don't see why a mutex lock
> would be expensive 
> when there's no contention...  Something else to
> look into when I have time.
> 
> Daniel
> 
> Fred L Youhanaie wrote:
> 
> > 
> > Hi,
> > 
> > Here is something that I've noticed, in case it is
> helpful.
> > 
> > A few weeks ago I was playing with the DRMAA C API
> on my laptop and I 
> > did notice a 2 second delay when running the
> simple init/exit example 
> > from
>
http://gridengine.sunsource.net/project/gridengine/howto/drmaa.html
> > 
> > When run via strace, most of the delay was in the
> two futex syscalls 
> > called from drmaa_init. I have SGE 6.0u1, FC2,
> 2.6.9, no bdb, qmaster 
> > and execd are all on the same host.
> > 
> > HTH
> > 
> > Cheers
> > f.
> > 
> > 
> > Kevin Ruland wrote:
> > 
> >>
> >> All,
> >>
> >> My situation is a little more complex than I
> first let on. First I'm 
> >> not using DRMAA directly, but rather through
> python swig wrappers.  So 
> >> I need to figure out what I'm doing in there,
> recode in straight C and 
> >> see if there is any significant difference.  So I
> don't have a nice 
> >> test case yet.
> >>
> >> I also don't have hard numbers.  It's all a
> feeling now.
> >>
> >> Kevin
> >>
> >> Ron Chen wrote:
> >>
> >>> Kevin,
> >>>
> >>> Do you have a testcase?
> > 
> > 
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> > 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 



		
__________________________________ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list