[GE users] Job submission speed through DRMAA api.

Ron Chen ron_chen_123 at yahoo.com
Fri Jan 21 16:27:15 GMT 2005


But from my debug dump, you can see that the second
thread blocks but it doesn't get anything from
qmaster:

"../libs/evc/sge_event_client.c 2695 commlib returns
got no message"

So does the second thread time out while waiting?

 -Ron

--- Daniel Templeton <Dan.Templeton at Sun.COM> wrote:
> Duh.  I should have known that.  I wrote it. ;)
> 
> The reason for the difference is that qsub is
> single-threaded unless a
> -sync or -now option is used.  DRMAA is always
> mutli-threaded.  At the
> creation of the second thread, the second thread
> requests a current list
> of running jobs from the qmaster.  It and the main
> thread block until
> that list is received.
> 
> OK, so we've accounted for 2 of 3.5 seconds.  The
> other 1.5 must be in
> the job templates and the drmaa_run_job() call.  The
> next interesting
> test would be to time how long the drmaa_run_job()
> call takes.  I
> suspect it's less than 0.1 seconds.
> 
> Daniel
> 
> Ron Chen wrote:
> 
> >I looked into this problem today, here's what I
> found:
> >
> >qsub and drmaa_init() both eventually call
> >japi_init(), but drmaa_init() takes 2 seconds:
> >
> >% time ./a.out
> >0.000u 0.002s 0:01.96 0.0%      0+0k 0+0io 0pf+0w
> >
>
>=======================================================
> >#include "drmaa.h"
> >
> >int main (int argc, char **argv) {
> >   char error[DRMAA_ERROR_STRING_BUFFER];
> >   drmaa_init (NULL, error,
> >DRMAA_ERROR_STRING_BUFFER);
> >}
>
>=======================================================
> >
> >I looked into why japi_init() can make such a
> >difference, and found that for the parameter "bool
> >enable_wait", qsub is false, while drmaa_init() is
> >true.
> >
> >And if I change enable_wait to false, it becomes:
> >
> >% time ./a.out
> >0.000u 0.001s 0:00.00 0.0%      0+0k 0+0io 0pf+0w
> >
> >But after I read that enable_wait is needed to
> allow
> >japi_wait() and japi_synchronize() to function.
> >
> >And looking into why it tooks so much time waiting,
> I
> >turned on tracing:
> >
> > --> gdi_receive_sec_message() {
> > <-- gdi_receive_sec_message()
> >../libs/gdi/sge_security.c 190 }
> > --> sge_log() {  <- **most of the wait time is
> here**
> >    ../libs/evc/sge_event_client.c 2695 commlib
> >returns got no message
> > <-- get_event_list()
> ../libs/evc/sge_event_client.c
> >2713 }
> > --> sge_get_qmaster_port() {
> >
> >Stack trace:
> > get_event_list()
> > ec_get()
> > japi_implementation_thread()
> > start_thread ()
> >
> >So commlib was waiting for qmaster to return data
> but
> >found none?
> >
> > -Ron
> >
> >
> >--- Daniel Templeton <Dan.Templeton at Sun.COM> wrote:
> >  
> >
> >>That's interesting.  I don't see why a mutex lock
> >>would be expensive 
> >>when there's no contention...  Something else to
> >>look into when I have time.
> >>
> >>Daniel
> >>
> >>Fred L Youhanaie wrote:
> >>
> >>    
> >>
> >>>Hi,
> >>>
> >>>Here is something that I've noticed, in case it
> is
> >>>      
> >>>
> >>helpful.
> >>    
> >>
> >>>A few weeks ago I was playing with the DRMAA C
> API
> >>>      
> >>>
> >>on my laptop and I 
> >>    
> >>
> >>>did notice a 2 second delay when running the
> >>>      
> >>>
> >>simple init/exit example 
> >>    
> >>
> >>>from
> >>>      
> >>>
>
>http://gridengine.sunsource.net/project/gridengine/howto/drmaa.html
> >  
> >
> >>>When run via strace, most of the delay was in the
> >>>      
> >>>
> >>two futex syscalls 
> >>    
> >>
> >>>called from drmaa_init. I have SGE 6.0u1, FC2,
> >>>      
> >>>
> >>2.6.9, no bdb, qmaster 
> >>    
> >>
> >>>and execd are all on the same host.
> >>>
> >>>HTH
> >>>
> >>>Cheers
> >>>f.
> >>>
> >>>
> >>>Kevin Ruland wrote:
> >>>
> >>>      
> >>>
> >>>>All,
> >>>>
> >>>>My situation is a little more complex than I
> >>>>        
> >>>>
> >>first let on. First I'm 
> >>    
> >>
> >>>>not using DRMAA directly, but rather through
> >>>>        
> >>>>
> >>python swig wrappers.  So 
> >>    
> >>
> >>>>I need to figure out what I'm doing in there,
> >>>>        
> >>>>
> >>recode in straight C and 
> >>    
> >>
> >>>>see if there is any significant difference.  So
> I
> >>>>        
> >>>>
> >>don't have a nice 
> >>    
> >>
> >>>>test case yet.
> >>>>
> >>>>I also don't have hard numbers.  It's all a
> >>>>        
> >>>>
> >>feeling now.
> >>    
> >>
> >>>>Kevin
> >>>>
> >>>>Ron Chen wrote:
> >>>>
> >>>>        
> >>>>
> >>>>>Kevin,
> >>>>>
> >>>>>Do you have a testcase?
> >>>>>          
> >>>>>
> >>>
> >>>
> >>>      
> 
=== message truncated ===



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list