[GE users] Long delay when submitting large jobs

Craig Tierney ctierney at hpti.com
Tue Jan 18 17:15:59 GMT 2005


On Tue, 2005-01-18 at 09:11, Stephan Grell wrote:
> Hello,
> 
> I did some testing my self and got some surprising times.... (reported 
> via qping)
> I started 3 mpi jobs with different sizes. The job start up time is:
> - pe mpi 300 : 41s
> - pe mpi 340 : 53s
> - pe mpi 350 : 57s
> 
> During this time the qmaster is not processing any requests. I have not 
> done any
> debuging yet, but it looks worth than I expected.

This is the same thing I am seeing.

I haven't debugged it much, but I did see that the qmaster
talks to each node when transiitoning the job.  It does this
serially.  It could be faster.  This doesn't explain the blocking
system though.

Craig

> 
> I had a dual sparc III ~1 GHz machine with BDB spooling and 175 exec hosts.
> 
> Stephan
> 
> -
> 
> Eric Whiting wrote:
> 
> > Similar setup here. Similar problem.
> >
> > From a usability standpoint it is a 'bad thing' to see these very long 
> > qstat/qsub hangs. qstat is something that users want to type and see a 
> > response.  When it takes more than 10s they think it is sick. When it 
> > takes more than 60s they think it is broken.  We are seeing qstat hang 
> > longer than 60s.
> >
> > I've turned on scheduler profiling. I've noticed that the schedd time 
> > numbers are sometimes less than the actual 'qstat hang time'.  I'll 
> > qsub a job. I'll wait for a few seconds and then I'll do a 
> > date;qstat;date. Then I wait. The time diff is often more than the 
> > schedd run time reported in the log file.   Here are two recent 
> > entries from the logs.
> >
> > 01/18/2005 07:33:55|schedd|master|I|PROF: schedd run took: 98.970 s 
> > (init: 0.000 s, copy: 0.130 s, run:98.750, free: 0.090 s, jobs: 16, 
> > categories: 4)
> > 01/18/2005 07:46:16|schedd|master|I|PROF: schedd run took: 99.630 s 
> > (init: 0.000 s, copy: 0.130 s, run:99.410, free: 0.090 s, jobs: 19, 
> > categories: 4)
> >
> > I'm watching NFS traffic. I don't see that much traffic. This might be 
> > something else
> >
> > qmaster: sun 4800, solaris 9
> > NFS spooling
> > N1GE 6.0U2
> > 230 linux execd hosts (v20z)
> >
> > eric
> >
> >
> >
> >
> >
> >
> >
> >
> > Craig Tierney wrote:
> >
> >>
> >>
> >> Qmaster has to talk to 128+ nodes.  On systems like this, it is probably
> >> done linearly and just takes a buch of time.  The issue is, why does the
> >> multi-threaded qmaster block when doing this?  Why can't another thread
> >> respond to requests?  If not for all operations, at least for actions
> >> that read the database.
> >>
> >> Thanks,
> >> Craig
> >>
> >>
> >>  
> >>
> >>> Cheers,
> >>> Stephan
> >>>
> >>> Craig Tierney wrote:
> >>>
> >>>   
> >>>
> >>>> I have been running SGE6.0u1 for a few months now on a new system.
> >>>> I have noticed very long delays, or even SGE hangs, when starting
> >>>> large jobs.  I just tried this on the latest CVS source and
> >>>> the problem persists.
> >>>>
> >>>> It appears that the hang while the job is moved from 'qw' to t.
> >>>> In general the system does continue to operate normally.  However
> >>>> the delays can be large, 30-60 seconds.  'Hang' is defined as
> >>>> system commands like qsub and qstat will delay until the job
> >>>> has finished migrating to the 't' status.  Sometimes the delays
> >>>> are long enough to get GDI failures.  Since qmaster is threaded,
> >>>> I wonder why I get the hangs.
> >>>>
> >>>> I have tried debugging the situation.  Either the hang is in qmaster,
> >>>> or sge_schedd is not printing enough information.
> >>>>
> >>>> Here is some of the text from the sge_schedd debug for a 256 cpu job
> >>>> using a cluster queue.
> >>>>
> >>>> 79347   7886 16384     J=179999.1 T=STARTING S=1105726988 D=43200 
> >>>> L=Q O=qecomp.q at e0129 R=slots U=2.000000
> >>>> 79348   7886 16384     J=179999.1 T=STARTING S=1105726988 D=43200 
> >>>> L=Q O=qecomp.q at e0130 R=slots U=2.000000
> >>>> 79349   7886 16384     J=179999.1 T=STARTING S=1105726988 D=43200 
> >>>> L=Q O=qecomp.q at e0131 R=slots U=2.000000
> >>>> 79350   7886 16384     Found NOW assignment
> >>>> 79351   7886 16384     reresolve port timeout in 536
> >>>> 79352   7886 16384     returning cached port value: 536
> >>>> scheduler tries to schedule job 179999.1 twice
> >>>> 79353   7886 16384        added 0 ticket orders for queued jobs
> >>>> 79354   7886 16384     SENDING 10 ORDERS TO QMASTER
> >>>> 79355   7886 16384     RESETTING BUSY STATE OF EVENT CLIENT
> >>>> 79356   7886 16384     reresolve port timeout in 536
> >>>> 79357   7886 16384     returning cached port value: 536
> >>>> 79358   7886 16384     ec_get retrieving events - will do max 3 
> >>>> fetches
> >>>>
> >>>> The hang happens after line 79352.  In this instance the message
> >>>> indicates the scheduler tried twice.  Other times, I get a timeout
> >>>> at this point.  In either case, the output pauses in the same
> >>>> manner that a call to qsub or qstat would.
> >>>>
> >>>> I have followed the optimization procedures listed on the website
> >>>> and they didn't seem to help (might have missed some though).
> >>>>
> >>>> I don't have any information from sge_qmaster.  I tried several
> >>>> different SGE_DEBUG_LEVEL settings, but sge_qmaster would always
> >>>> stop providing information after daemonizing.
> >>>>
> >>>> System configuration:
> >>>>
> >>>> Qmaster runs on Fedora Core 2, x86, (2.2 Ghz Xeon)
> >>>> clients (execd) run on Suse 9.1 x86_64, (3.2 Ghz EM64T)
> >>>> SGE is configured to use old style spooling over NFS
> >>>>
> >>>> I can provide more info, I just don't know where to go from here.
> >>>>
> >>>> Thanks,
> >>>> Craig
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>
> >>>>
> >>>>
> >>>>     
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>
> >>>   
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>  
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list