[GE users] Job submission speed through DRMAA api.

Daniel Templeton Dan.Templeton at Sun.COM
Fri Jan 21 09:26:48 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Duh.  I should have known that.  I wrote it. ;)

The reason for the difference is that qsub is single-threaded unless a
-sync or -now option is used.  DRMAA is always mutli-threaded.  At the
creation of the second thread, the second thread requests a current list
of running jobs from the qmaster.  It and the main thread block until
that list is received.

OK, so we've accounted for 2 of 3.5 seconds.  The other 1.5 must be in
the job templates and the drmaa_run_job() call.  The next interesting
test would be to time how long the drmaa_run_job() call takes.  I
suspect it's less than 0.1 seconds.

Daniel

Ron Chen wrote:

>I looked into this problem today, here's what I found:
>
>qsub and drmaa_init() both eventually call
>japi_init(), but drmaa_init() takes 2 seconds:
>
>% time ./a.out
>0.000u 0.002s 0:01.96 0.0%      0+0k 0+0io 0pf+0w
>
>=======================================================
>#include "drmaa.h"
>
>int main (int argc, char **argv) {
>   char error[DRMAA_ERROR_STRING_BUFFER];
>   drmaa_init (NULL, error,
>DRMAA_ERROR_STRING_BUFFER);
>}
>=======================================================
>
>I looked into why japi_init() can make such a
>difference, and found that for the parameter "bool
>enable_wait", qsub is false, while drmaa_init() is
>true.
>
>And if I change enable_wait to false, it becomes:
>
>% time ./a.out
>0.000u 0.001s 0:00.00 0.0%      0+0k 0+0io 0pf+0w
>
>But after I read that enable_wait is needed to allow
>japi_wait() and japi_synchronize() to function.
>
>And looking into why it tooks so much time waiting, I
>turned on tracing:
>
> --> gdi_receive_sec_message() {
> <-- gdi_receive_sec_message()
>../libs/gdi/sge_security.c 190 }
> --> sge_log() {  <- **most of the wait time is here**
>    ../libs/evc/sge_event_client.c 2695 commlib
>returns got no message
> <-- get_event_list() ../libs/evc/sge_event_client.c
>2713 }
> --> sge_get_qmaster_port() {
>
>Stack trace:
> get_event_list()
> ec_get()
> japi_implementation_thread()
> start_thread ()
>
>So commlib was waiting for qmaster to return data but
>found none?
>
> -Ron
>
>
>--- Daniel Templeton <Dan.Templeton at Sun.COM> wrote:
>  
>
>>That's interesting.  I don't see why a mutex lock
>>would be expensive 
>>when there's no contention...  Something else to
>>look into when I have time.
>>
>>Daniel
>>
>>Fred L Youhanaie wrote:
>>
>>    
>>
>>>Hi,
>>>
>>>Here is something that I've noticed, in case it is
>>>      
>>>
>>helpful.
>>    
>>
>>>A few weeks ago I was playing with the DRMAA C API
>>>      
>>>
>>on my laptop and I 
>>    
>>
>>>did notice a 2 second delay when running the
>>>      
>>>
>>simple init/exit example 
>>    
>>
>>>from
>>>      
>>>
>http://gridengine.sunsource.net/project/gridengine/howto/drmaa.html
>  
>
>>>When run via strace, most of the delay was in the
>>>      
>>>
>>two futex syscalls 
>>    
>>
>>>called from drmaa_init. I have SGE 6.0u1, FC2,
>>>      
>>>
>>2.6.9, no bdb, qmaster 
>>    
>>
>>>and execd are all on the same host.
>>>
>>>HTH
>>>
>>>Cheers
>>>f.
>>>
>>>
>>>Kevin Ruland wrote:
>>>
>>>      
>>>
>>>>All,
>>>>
>>>>My situation is a little more complex than I
>>>>        
>>>>
>>first let on. First I'm 
>>    
>>
>>>>not using DRMAA directly, but rather through
>>>>        
>>>>
>>python swig wrappers.  So 
>>    
>>
>>>>I need to figure out what I'm doing in there,
>>>>        
>>>>
>>recode in straight C and 
>>    
>>
>>>>see if there is any significant difference.  So I
>>>>        
>>>>
>>don't have a nice 
>>    
>>
>>>>test case yet.
>>>>
>>>>I also don't have hard numbers.  It's all a
>>>>        
>>>>
>>feeling now.
>>    
>>
>>>>Kevin
>>>>
>>>>Ron Chen wrote:
>>>>
>>>>        
>>>>
>>>>>Kevin,
>>>>>
>>>>>Do you have a testcase?
>>>>>          
>>>>>
>>>
>>>
>>>      
>>>
>---------------------------------------------------------------------
>  
>
>>>To unsubscribe, e-mail:
>>>      
>>>
>>users-unsubscribe at gridengine.sunsource.net
>>    
>>
>>>For additional commands, e-mail:
>>>      
>>>
>>users-help at gridengine.sunsource.net
>>    
>>
>>
>>    
>>
>---------------------------------------------------------------------
>  
>
>>To unsubscribe, e-mail:
>>users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail:
>>users-help at gridengine.sunsource.net
>>
>>
>>    
>>
>
>
>
>		
>__________________________________ 
>Do you Yahoo!? 
>Meet the all-new My Yahoo! - Try it today! 
>http://my.yahoo.com 
> 
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>

-- 
***************************************************
*        Daniel Templeton   ERGB01 x60220         *
*       Staff Engineer, Sun N1 Grid Engine        *
***************************************************
* "Roads? Where we're going we don't need roads." *
*                    -Dr. Emmett Brown            *
*                     Back to the Future (1985)   *
***************************************************



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list