[GE users] Long delay when submitting large jobs

Ron Chen ron_chen_123 at yahoo.com
Tue Jan 18 01:54:58 GMT 2005


Before you upgraded to SGE 6.0, did you see similar
problems with SGE 5.3?

If SGE 5.3 works fine, may be it's related to the new
threaded communication library (SGE 5.3 uses commd).

(pure guessing)

 -Ron


--- Craig Tierney <ctierney at hpti.com> wrote:
> I reinstalled SGE temporarily using BDB to see if
> that
> would improve startup times.  It took about 75
> seconds
> for a 512 processor job to transition from 'qw' to
> 't'.
> 
> The server was running DBD and it was installed on
> the local disk.
> The $SGE_ROOT/default was still on NFS because the
> other nodes
> do not have disks.  However, the IO to the
> filesystem from the clients
> is small.
> 
> I ran strace on one of the sge_qmaster processes to
> try and
> see what is going on.  I can't pick-out exactly what
> is going
> on, but I did see that /etc/hosts was mmap'ed once
> for each
> node.  I know that /etc/hosts should be cached, but
> I don't see
> what gethostbyname (or whichever function it is)
> needs to be called
> directly for each host.  The file shouldn't be
> changing during a
> job startup.
> 
> There were many other mmap/munmap calls as well as
> calls to
> gettimeofday.  However, I couldn't correlate it to
> exactly what
> it was doing.
> 
> When qmaster starts up a job, does it talk to each
> host, one by
> one, setting up the job information?  The scheduler
> actually picks
> the nodes used, correct?  If qmaster is talking to
> each node,
> is it done serially or are multiple requests sent
> out simultaneously?
> 
> Thanks,
> Craig
> 
> 
> 
> 
> 
> > 
> > The execd should also spool localy. What is the
> reason for not doing
> > it?
> > >   
> > > > 5)any stageing activity between master and
> compute nodes?
> > > >     
> > > No.
> > > 
> > > I don't care if my job takes 10 minutes to
> start.  That isn't
> > > the problem.  It is that the batch system hangs
> during this time.
> > > That is should not do.  It is not dependent on
> the type of job, 
> > > just the number of cpus (nodes) used.
> > > 
> > > Thanks,
> > > Craig
> > > 
> > > 
> > > 
> > > 
> > >   
> > > > regards
> > > > 
> > > > 
> > > >     
> > > It has nothing to do with the binary.  This is
> the time
> > > before the job script is actually launched.  I
> don't even
> > > think this time covers the prolog/epilog
> execution.  My
> > > prolog/epilog can run long (touches all nodes in
> parallel), but
> > > the batch system shouldn't be waiting on that.
> > > 
> > > Craig
> > > 
> > > 
> > > 
> > >   
> > > > On Fri, 14 Jan 2005 11:29:58 -0700, Craig
> Tierney <ctierney at hpti.com> wrote:
> > > >     
> > > > > I have been running SGE6.0u1 for a few
> months now on a new system.
> > > > > I have noticed very long delays, or even SGE
> hangs, when starting
> > > > > large jobs.  I just tried this on the latest
> CVS source and
> > > > > the problem persists.
> > > > > 
> > > > > It appears that the hang while the job is
> moved from 'qw' to t.
> > > > > In general the system does continue to
> operate normally.  However
> > > > > the delays can be large, 30-60 seconds. 
> 'Hang' is defined as
> > > > > system commands like qsub and qstat will
> delay until the job
> > > > > has finished migrating to the 't' status. 
> Sometimes the delays
> > > > > are long enough to get GDI failures.  Since
> qmaster is threaded,
> > > > > I wonder why I get the hangs.
> > > > > 
> > > > > I have tried debugging the situation. 
> Either the hang is in qmaster,
> > > > > or sge_schedd is not printing enough
> information.
> > > > > 
> > > > > Here is some of the text from the sge_schedd
> debug for a 256 cpu job
> > > > > using a cluster queue.
> > > > > 
> > > > >  79347   7886 16384     J=179999.1
> T=STARTING S=1105726988 D=43200 L=Q O=qecomp.q at e0129
> R=slots U=2.000000
> > > > >  79348   7886 16384     J=179999.1
> T=STARTING S=1105726988 D=43200 L=Q O=qecomp.q at e0130
> R=slots U=2.000000
> > > > >  79349   7886 16384     J=179999.1
> T=STARTING S=1105726988 D=43200 L=Q O=qecomp.q at e0131
> R=slots U=2.000000
> > > > >  79350   7886 16384     Found NOW assignment
> > > > >  79351   7886 16384     reresolve port
> timeout in 536
> > > > >  79352   7886 16384     returning cached
> port value: 536
> > > > > scheduler tries to schedule job 179999.1
> twice
> > > > >  79353   7886 16384        added 0 ticket
> orders for queued jobs
> > > > >  79354   7886 16384     SENDING 10 ORDERS TO
> QMASTER
> > > > >  79355   7886 16384     RESETTING BUSY STATE
> OF EVENT CLIENT
> > > > >  79356   7886 16384     reresolve port
> timeout in 536
> > > > >  79357   7886 16384     returning cached
> port value: 536
> > > > >  79358   7886 16384     ec_get retrieving
> events - will do max 3 fetches
> > > > > 
> > > > > The hang happens after line 79352.  In this
> instance the message
> > > > > indicates the scheduler tried twice.  Other
> times, I get a timeout
> > > > > at this point.  In either case, the output
> pauses in the same
> > > > > manner that a call to qsub or qstat would.
> > > > > 
> > > > > I have followed the optimization procedures
> listed on the website
> > > > > and they didn't seem to help (might have
> missed some though).
> > > > > 
> > > > > I don't have any information from
> sge_qmaster.  I tried several
> > > > > different SGE_DEBUG_LEVEL settings, but
> sge_qmaster would always
> > > > > stop providing information after
> daemonizing.
> > > > > 
> > > > > System configuration:
> > > > > 
> > > > > Qmaster runs on Fedora Core 2, x86, (2.2 Ghz
> Xeon)
> > > > > clients (execd) run on Suse 9.1 x86_64, (3.2
> Ghz EM64T)
> > > > > SGE is configured to use old style spooling
> over NFS
> > > > > 
> > > > > I can provide more info, I just don't know
> where to go from here.
> > > > > 
> > > > > Thanks,
> 
=== message truncated ===



		
__________________________________ 
Do you Yahoo!? 
All your favorites on one personal page ? Try My Yahoo!
http://my.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list