[GE users] Long delay when submitting large jobs

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Mon Jan 17 12:40:04 GMT 2005



Craig Tierney wrote:

>On Fri, 2005-01-14 at 12:16, Hung-sheng Tsao wrote:
>  
>
>>just try to understand
>>1)how big is the binary?, I assume that all binary is NFS mounted? 
>>    
>>
>
>This is all before the job script is run, so it has nothing to
>do with the binary.
>
>  
>
>>2)how big is the input files?
>>    
>>
>
>See above.
>
Sorry, but I did not find it in the emails. Could you help me?

>
>  
>
>>3)I ASSUME that the interconnect is gigabit? and server has one
>>gigabit that will support 128Node (256CPU) in a flat network?
>>    
>>
>
>Each rack (22 nodes) is gigE to a switch, which is the uplinked
>to a master switch.
>
>  
>
>>4)what is the storage? HW raid array?
>>    
>>
>
>NFS currently.
>
In an earlier email you stated that you use classic spooling over
NFS.
To say it simple: that is bad for the performance. :-)

Is there a reason for not using local BDB spooling? During job start
are a lot of objects modified, and they are all spooled....

The execd should also spool localy. What is the reason for not doing it?

>
>  
>
>>5)any stageing activity between master and compute nodes?
>>    
>>
>
>No.
>
>I don't care if my job takes 10 minutes to start.  That isn't
>the problem.  It is that the batch system hangs during this time.
>That is should not do.  It is not dependent on the type of job, 
>just the number of cpus (nodes) used.
>
>Thanks,
>Craig
>
>
>
>
>  
>
>>regards
>>
>>
>>    
>>
>
>It has nothing to do with the binary.  This is the time
>before the job script is actually launched.  I don't even
>think this time covers the prolog/epilog execution.  My
>prolog/epilog can run long (touches all nodes in parallel), but
>the batch system shouldn't be waiting on that.
>
>Craig
>
>
>
>  
>
>>On Fri, 14 Jan 2005 11:29:58 -0700, Craig Tierney <ctierney at hpti.com> wrote:
>>    
>>
>>>I have been running SGE6.0u1 for a few months now on a new system.
>>>I have noticed very long delays, or even SGE hangs, when starting
>>>large jobs.  I just tried this on the latest CVS source and
>>>the problem persists.
>>>
>>>It appears that the hang while the job is moved from 'qw' to t.
>>>In general the system does continue to operate normally.  However
>>>the delays can be large, 30-60 seconds.  'Hang' is defined as
>>>system commands like qsub and qstat will delay until the job
>>>has finished migrating to the 't' status.  Sometimes the delays
>>>are long enough to get GDI failures.  Since qmaster is threaded,
>>>I wonder why I get the hangs.
>>>
>>>I have tried debugging the situation.  Either the hang is in qmaster,
>>>or sge_schedd is not printing enough information.
>>>
>>>Here is some of the text from the sge_schedd debug for a 256 cpu job
>>>using a cluster queue.
>>>
>>> 79347   7886 16384     J=179999.1 T=STARTING S=1105726988 D=43200 L=Q O=qecomp.q at e0129 R=slots U=2.000000
>>> 79348   7886 16384     J=179999.1 T=STARTING S=1105726988 D=43200 L=Q O=qecomp.q at e0130 R=slots U=2.000000
>>> 79349   7886 16384     J=179999.1 T=STARTING S=1105726988 D=43200 L=Q O=qecomp.q at e0131 R=slots U=2.000000
>>> 79350   7886 16384     Found NOW assignment
>>> 79351   7886 16384     reresolve port timeout in 536
>>> 79352   7886 16384     returning cached port value: 536
>>>scheduler tries to schedule job 179999.1 twice
>>> 79353   7886 16384        added 0 ticket orders for queued jobs
>>> 79354   7886 16384     SENDING 10 ORDERS TO QMASTER
>>> 79355   7886 16384     RESETTING BUSY STATE OF EVENT CLIENT
>>> 79356   7886 16384     reresolve port timeout in 536
>>> 79357   7886 16384     returning cached port value: 536
>>> 79358   7886 16384     ec_get retrieving events - will do max 3 fetches
>>>
>>>The hang happens after line 79352.  In this instance the message
>>>indicates the scheduler tried twice.  Other times, I get a timeout
>>>at this point.  In either case, the output pauses in the same
>>>manner that a call to qsub or qstat would.
>>>
>>>I have followed the optimization procedures listed on the website
>>>and they didn't seem to help (might have missed some though).
>>>
>>>I don't have any information from sge_qmaster.  I tried several
>>>different SGE_DEBUG_LEVEL settings, but sge_qmaster would always
>>>stop providing information after daemonizing.
>>>
>>>System configuration:
>>>
>>>Qmaster runs on Fedora Core 2, x86, (2.2 Ghz Xeon)
>>>clients (execd) run on Suse 9.1 x86_64, (3.2 Ghz EM64T)
>>>SGE is configured to use old style spooling over NFS
>>>
>>>I can provide more info, I just don't know where to go from here.
>>>
>>>Thanks,
>>>Craig
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>



More information about the gridengine-users mailing list