[GE users] Startup times and other issues with 6.0u3

Brian R Smith brian at cypher.acomp.usf.edu
Tue Mar 22 17:43:55 GMT 2005


To all:

Thanks for all the help.  We found that the cause of the problem was a
few bad NICs and some flaky hardware.  SGE and our parallel codes have
been running smoothly since we've fixed those issues.

Big thanks to everyone that assisted me.


Brian R Smith

On Tue, 2005-03-22 at 09:44 -0500, Brian R Smith wrote:
> Stephan,
> 
> Don't worry about your main trunk because it seems that our hardware is
> the definite cause of this problem.  We've found a number of machines
> that are running horribly under netperf on this cluster.  We just
> upgraded a bunch of hardware last week so we're ironing out some bugs.
> The same problems show up with 6.0u1 because of these hardware issues.
> 
> Brian
> 
> On Tue, 2005-03-22 at 08:50 +0100, Stephan Grell - Sun Germany - SSG -
> Software Engineer wrote:
> > Brian R Smith wrote:
> > 
> > >Stephan,
> > >
> > >I just downgraded to SGE 6.0u1 to see if u3 was the problem.  Since u1
> > >didn't give me any trouble on our other clusters, I figured it would do
> > >for now.  It looks like u1 doesn't have the -dump option for qping.
> > >  
> > >
> > You are right, it is not in u1. We are still working on getting qping to 
> > become a nice debuging
> > tool. I think the dump funktionality is part of u3, but it is for sure 
> > part of u4.
> > 
> > >************
> > >
> > >[root at mimir ~]# qping -dump mimir 5000 qmaster 1
> > >usage: qping [-i <interval>] [-info] [-f] [-noalias] <host> <port>
> > ><name> <id>
> > >   -i       : set ping interval time
> > >   -info    : show full status information and exit
> > >   -f       : show full status information on each ping interval
> > >   -noalias : ignore $SGE_ROOT/SGE_CELL/common/host_aliases file
> > >   host     : host name of running component
> > >   port     : port number of running component
> > >   name     : name of running component (e.g.: "qmaster" or "execd")
> > >   id       : id of running component (e.g.: 1 for daemons)
> > >
> > >example:
> > >qping -info clustermaster 5000 qmaster 1
> > >
> > >************
> > >
> > >We're running a bunch of netperf tests.  It is my belief that this
> > >problem may not be an SGE issue but a problem with some component of our
> > >hardware, to be more specific, I am suspecting the switch more than
> > >anything.
> > >  
> > >
> > Does it not mean that it is a SGE problem, when your problem vanished 
> > after going back
> > to u1? Could you try the current maintrunk? Would be good to know, it 
> > the problem is still
> > in the source base.
> > 
> > Stephan
> > 
> > >I'll keep you guys posted on what happens.
> > >
> > >Thanks for all the help.
> > >
> > >Brian
> > >
> > >On Mon, 2005-03-21 at 11:25 +0100, Stephan Grell - Sun Germany - SSG -
> > >Software Engineer wrote:
> > >  
> > >
> > >>Brian,
> > >>
> > >>sorry for the late reply. I am a bit shocked to read about 7-10 min. startup
> > >>time. The u4 fix might help, but I did not see delays longer than 1 min with
> > >>much bigger jobs.
> > >>
> > >>You could use qping -dump for monitoring the traffic between the qmaster and
> > >>the execd. It will give you the time stamps when which client send what.
> > >>
> > >>Is it possible to post the output for an empty cluster with a starting mpi
> > >>job?
> > >>
> > >>Stephan
> > >>
> > >>Brian R Smith wrote:
> > >>
> > >>    
> > >>
> > >>>Reuti,
> > >>>
> > >>>Right, 't' only tells me what nodes have been allocated to run the job.  
> > >>>The job does not start until 'r'.  Makes perfect sense.
> > >>>However, I can attest to the 7-10 minute wait times.  When 
> > >>>tight-integration is turned off, processes start up within a couple of 
> > >>>seconds (plus the time it takes for the scheduler to "make its rounds").
> > >>>
> > >>>Brian
> > >>>
> > >>>Reuti wrote:
> > >>>
> > >>> 
> > >>>
> > >>>      
> > >>>
> > >>>>Hi Brian,
> > >>>>
> > >>>>the status 't' is *not* a real-time display, whether the job is generating any 
> > >>>>CPU load. But, I must admit, that I saw only a delay of about 1-2 minutes 
> > >>>>before it changed to 'r'. Maybe it's related to the PE startup delay in u3.
> > >>>>
> > >>>>When the job is started, it may already been working although the status is 
> > >>>>'t'. More informative is to look at the CPU usage on the node with "top" or "ps 
> > >>>>-e f -o pid,time,command".
> > >>>>
> > >>>>CU - Reuti
> > >>>>
> > >>>>Quoting Brian R Smith <brian at cypher.acomp.usf.edu>:
> > >>>>
> > >>>>
> > >>>>
> > >>>>   
> > >>>>
> > >>>>        
> > >>>>
> > >>>>>Sean,
> > >>>>>
> > >>>>>That is exacly what happens, allocation occurs and job waits in 't' 
> > >>>>>state for 7-10 minutes.  I've reenabled "control slaves" because I 
> > >>>>>figured I could live with this problem till u4 comes out (not that many 
> > >>>>>people run 42 node, cluster spanning jobs).  My big concern right now is 
> > >>>>>with running MM5 under SGE as there seems to be some problems with 
> > >>>>>message passing.
> > >>>>>
> > >>>>>Brian
> > >>>>>
> > >>>>>Sean Dilda wrote:
> > >>>>>
> > >>>>>  
> > >>>>>
> > >>>>>     
> > >>>>>
> > >>>>>          
> > >>>>>
> > >>>>>>Ron Chen wrote:
> > >>>>>>
> > >>>>>>    
> > >>>>>>
> > >>>>>>       
> > >>>>>>
> > >>>>>>            
> > >>>>>>
> > >>>>>>>--- Brian R Smith <brian at cypher.acomp.usf.edu> wrote:
> > >>>>>>>
> > >>>>>>>      
> > >>>>>>>
> > >>>>>>>         
> > >>>>>>>
> > >>>>>>>              
> > >>>>>>>
> > >>>>>>>>You are absolutely the man.  Setting "control
> > >>>>>>>>slaves" to false fixed all of my problems.
> > >>>>>>>>        
> > >>>>>>>>
> > >>>>>>>>           
> > >>>>>>>>
> > >>>>>>>>                
> > >>>>>>>>
> > >>>>>>>No, it is not fixing anything!
> > >>>>>>>
> > >>>>>>>"control slaves" means non-tight integration, so you
> > >>>>>>>won't get process control/accounting of the slaves MPI
> > >>>>>>>tasks.
> > >>>>>>>
> > >>>>>>>In SGE 6 update 4, the slow start problem was fixed.
> > >>>>>>>But the original problem was that starting a 400-node
> > >>>>>>>parallel job with tight integration takes several tens
> > >>>>>>>seconds or something. But for your case it takes 10
> > >>>>>>>minutes! So there is still something going on with
> > >>>>>>>your configuration.
> > >>>>>>>      
> > >>>>>>>
> > >>>>>>>         
> > >>>>>>>
> > >>>>>>>              
> > >>>>>>>
> > >>>>>>I've seem delays on the order of 5 minutes with 30 and 40-cpu jobs 
> > >>>>>>that I believe are related to the bug that's fixed in u4.  I think the 
> > >>>>>>people who only saw 10 or 20 second delays were lucky.
> > >>>>>>
> > >>>>>>Brian, when you say delay, what do you mean?  Is the job allocated 
> > >>>>>>nodes, but sitting in 't' state for 10 minutes before it switches to 
> > >>>>>>'r' ?  If so, then it does sound like the bug that will be fixed when 
> > >>>>>>u4 comes out.  However, Ron is right.  Turning off control slaves 
> > >>>>>>doesn't "fix" it, unless you don't care about tight-integration.
> > >>>>>>
> > >>>>>>---------------------------------------------------------------------
> > >>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > >>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >>>>>>    
> > >>>>>>
> > >>>>>>       
> > >>>>>>
> > >>>>>>            
> > >>>>>>
> > >>>>>---------------------------------------------------------------------
> > >>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > >>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >>>>>
> > >>>>>  
> > >>>>>
> > >>>>>     
> > >>>>>
> > >>>>>          
> > >>>>>
> > >>>>---------------------------------------------------------------------
> > >>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > >>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >>>>
> > >>>>
> > >>>>   
> > >>>>
> > >>>>        
> > >>>>
> > >>>---------------------------------------------------------------------
> > >>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > >>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >>>
> > >>> 
> > >>>
> > >>>      
> > >>>
> > >>---------------------------------------------------------------------
> > >>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > >>For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >>    
> > >>
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
-- 
_______________________________________
|  Brian R Smith                      |
|  Systems Administrator              |
|  Research Computing Core Facility   |
|  University of South Florida        |
|  Phone: 1(813)974-1467              |
|  4202 E Fowler Ave, LIB 613         |
_______________________________________


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list