[GE users] Job submission speed through DRMAA api.

Daniel Templeton Dan.Templeton at Sun.COM
Fri Jan 21 16:29:48 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ah...  There are two different things going on here.  The first is the
wait for the job list.  After the job list comes, the second thread then
blocks waiting for job events.  That's what you're seeing in the trace.

Daniel

Ron Chen wrote:

>But from my debug dump, you can see that the second
>thread blocks but it doesn't get anything from
>qmaster:
>
>"../libs/evc/sge_event_client.c 2695 commlib returns
>got no message"
>
>So does the second thread time out while waiting?
>
> -Ron
>
>--- Daniel Templeton <Dan.Templeton at Sun.COM> wrote:
>  
>
>>Duh.  I should have known that.  I wrote it. ;)
>>
>>The reason for the difference is that qsub is
>>single-threaded unless a
>>-sync or -now option is used.  DRMAA is always
>>mutli-threaded.  At the
>>creation of the second thread, the second thread
>>requests a current list
>>of running jobs from the qmaster.  It and the main
>>thread block until
>>that list is received.
>>
>>OK, so we've accounted for 2 of 3.5 seconds.  The
>>other 1.5 must be in
>>the job templates and the drmaa_run_job() call.  The
>>next interesting
>>test would be to time how long the drmaa_run_job()
>>call takes.  I
>>suspect it's less than 0.1 seconds.
>>
>>Daniel
>>
>>Ron Chen wrote:
>>
>>    
>>
>>>I looked into this problem today, here's what I
>>>      
>>>
>>found:
>>    
>>
>>>qsub and drmaa_init() both eventually call
>>>japi_init(), but drmaa_init() takes 2 seconds:
>>>
>>>% time ./a.out
>>>0.000u 0.002s 0:01.96 0.0%      0+0k 0+0io 0pf+0w
>>>
>>>      
>>>
>>=======================================================
>>    
>>
>>>#include "drmaa.h"
>>>
>>>int main (int argc, char **argv) {
>>>  char error[DRMAA_ERROR_STRING_BUFFER];
>>>  drmaa_init (NULL, error,
>>>DRMAA_ERROR_STRING_BUFFER);
>>>}
>>>      
>>>
>>=======================================================
>>    
>>
>>>I looked into why japi_init() can make such a
>>>difference, and found that for the parameter "bool
>>>enable_wait", qsub is false, while drmaa_init() is
>>>true.
>>>
>>>And if I change enable_wait to false, it becomes:
>>>
>>>% time ./a.out
>>>0.000u 0.001s 0:00.00 0.0%      0+0k 0+0io 0pf+0w
>>>
>>>But after I read that enable_wait is needed to
>>>      
>>>
>>allow
>>    
>>
>>>japi_wait() and japi_synchronize() to function.
>>>
>>>And looking into why it tooks so much time waiting,
>>>      
>>>
>>I
>>    
>>
>>>turned on tracing:
>>>
>>>--> gdi_receive_sec_message() {
>>><-- gdi_receive_sec_message()
>>>../libs/gdi/sge_security.c 190 }
>>>--> sge_log() {  <- **most of the wait time is
>>>      
>>>
>>here**
>>    
>>
>>>   ../libs/evc/sge_event_client.c 2695 commlib
>>>returns got no message
>>><-- get_event_list()
>>>      
>>>
>>../libs/evc/sge_event_client.c
>>    
>>
>>>2713 }
>>>--> sge_get_qmaster_port() {
>>>
>>>Stack trace:
>>>get_event_list()
>>>ec_get()
>>>japi_implementation_thread()
>>>start_thread ()
>>>
>>>So commlib was waiting for qmaster to return data
>>>      
>>>
>>but
>>    
>>
>>>found none?
>>>
>>>-Ron
>>>
>>>
>>>--- Daniel Templeton <Dan.Templeton at Sun.COM> wrote:
>>> 
>>>
>>>      
>>>
>>>>That's interesting.  I don't see why a mutex lock
>>>>would be expensive 
>>>>when there's no contention...  Something else to
>>>>look into when I have time.
>>>>
>>>>Daniel
>>>>
>>>>Fred L Youhanaie wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Hi,
>>>>>
>>>>>Here is something that I've noticed, in case it
>>>>>          
>>>>>
>>is
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>helpful.
>>>>   
>>>>
>>>>        
>>>>
>>>>>A few weeks ago I was playing with the DRMAA C
>>>>>          
>>>>>
>>API
>>    
>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>on my laptop and I 
>>>>   
>>>>
>>>>        
>>>>
>>>>>did notice a 2 second delay when running the
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>simple init/exit example 
>>>>   
>>>>
>>>>        
>>>>
>>>>>from
>>>>>     
>>>>>
>>>>>          
>>>>>
>>http://gridengine.sunsource.net/project/gridengine/howto/drmaa.html
>>    
>>
>>> 
>>>
>>>      
>>>
>>>>>When run via strace, most of the delay was in the
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>two futex syscalls 
>>>>   
>>>>
>>>>        
>>>>
>>>>>called from drmaa_init. I have SGE 6.0u1, FC2,
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>2.6.9, no bdb, qmaster 
>>>>   
>>>>
>>>>        
>>>>
>>>>>and execd are all on the same host.
>>>>>
>>>>>HTH
>>>>>
>>>>>Cheers
>>>>>f.
>>>>>
>>>>>
>>>>>Kevin Ruland wrote:
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>All,
>>>>>>
>>>>>>My situation is a little more complex than I
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>first let on. First I'm 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>not using DRMAA directly, but rather through
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>python swig wrappers.  So 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>I need to figure out what I'm doing in there,
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>recode in straight C and 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>see if there is any significant difference.  So
>>>>>>            
>>>>>>
>>I
>>    
>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>don't have a nice 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>test case yet.
>>>>>>
>>>>>>I also don't have hard numbers.  It's all a
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>feeling now.
>>>>   
>>>>
>>>>        
>>>>
>>>>>>Kevin
>>>>>>
>>>>>>Ron Chen wrote:
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Kevin,
>>>>>>>
>>>>>>>Do you have a testcase?
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>     
>>>>>          
>>>>>
>=== message truncated ===
>
>
>
>		
>__________________________________ 
>Do you Yahoo!? 
>Yahoo! Mail - Easier than ever with enhanced search. Learn more.
>http://info.mail.yahoo.com/mail_250
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>

-- 
***************************************************
*        Daniel Templeton   ERGB01 x60220         *
*       Staff Engineer, Sun N1 Grid Engine        *
***************************************************
* "Roads? Where we're going we don't need roads." *
*                    -Dr. Emmett Brown            *
*                     Back to the Future (1985)   *
***************************************************



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list