[GE users] Long delay when submitting large jobs

Craig Tierney ctierney at hpti.com
Mon Feb 7 18:44:42 GMT 2005


On Sun, 2005-02-06 at 20:26, Rayson Ho wrote:
> >It isn't fixed though.  The qmaster is serializing tasks and
> >blocking on communication to execd's.  If you want a system
> >to scale, it shouldn't block.
> 
> Agreed... that's why I put quotes around "fixes" in the other mail.
> 
> IMO, there are 2 ways to fix it:
> 
> 1) let 1 thread to start SGE's rshd for tight PE jobs, and the other one to
> handle qstat, qsub, etc. I know qmaster is threaded, but I don't know how
> we currently use the threads.
> 
My problem (I think) has nothing to do with the PE being
tight or loose.  The problem is that when migrating between
'qw' and 't', the server talks to every node (when control_slaves is
TRUE).  During this time, the server cannot respond to other
requests for information, like qsub.  The server shouldn't block.


> 2) the other way is to add a new layer of software, so that tight PE and
> non-tight PE jobs are started the same way to the qmaster. The new layer is
> like LSF's PAM or PBS/Torque's mpiexec, which starts the slave parallel
> tasks on the first execution host.
> 
> I played with integrating SGE and mpiexec, I sent this mail to the mpiexec
> list in 2003:
> http://email.osc.edu/pipermail/mpiexec/2003/000521.html
> 
> But it still relies on qmaster to start the rshds on the slave nodes. In
> order to fix the long delay problem, we need to:
> 
> - skip the code in qmaster to start the rshds for tight PE jobs
> - let mpiexec get the list of hosts, and start the parallel tasks 
>   using the TM (Task Management) interface.

Isn't the rshd process captured when there is a call to rsh
or ssh?  That is when the job script is running.  The problem
is prior to that.

Craig

> 
> Hmm, this 2 require less than 1 day of work :)
> 
> And then we need to write a new TM library for SGE... this can be several
> weeks of work!
> 
> The other advantage is that people don't need to configure the startproc,
> stop_proc and other details for the tight MPI PE, and also whatever is
> supported by mpiexec, we will get it for free!! (eg. LAM, MPICH, MPICH2,
> EMP, etc)
> 
> 
> You can take a look at this diagram:
> http://www.ms.washington.edu/Docs/LSF/LSF_4.2_Manual/parallel_4.2/
> images/pll-01-introa.gif
> 
> And mpiexec is very similar, but I couldn't find a nice diagram on the web
> which descibes the architecture.
> 
> Rayson
> 
> 
> >Craig
> >
> ---------------------------------------------------------
> Get your FREE E-mail account at http://www.eseenet.com !
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list