[GE users] Long delay when submitting large jobs

Andreas Haas Andreas.Haas at Sun.COM
Mon Feb 7 13:53:15 GMT 2005

On Mon, 17 Jan 2005, Craig Tierney wrote:

> On Mon, 2005-01-17 at 05:40, Stephan Grell - Sun Germany - SSG -
> Software Engineer wrote:
> >
> > Is there a reason for not using local BDB spooling? During job start
> > are a lot of objects modified, and they are all spooled....
> I reinstalled SGE temporarily using BDB to see if that
> would improve startup times.  It took about 75 seconds
> for a 512 processor job to transition from 'qw' to 't'.
> The server was running DBD and it was installed on the local disk.
> The $SGE_ROOT/default was still on NFS because the other nodes
> do not have disks.  However, the IO to the filesystem from the clients
> is small.
> I ran strace on one of the sge_qmaster processes to try and
> see what is going on.  I can't pick-out exactly what is going
> on, but I did see that /etc/hosts was mmap'ed once for each
> node.  I know that /etc/hosts should be cached, but I don't see
> what gethostbyname (or whichever function it is) needs to be called
> directly for each host.  The file shouldn't be changing during a
> job startup.
> There were many other mmap/munmap calls as well as calls to
> gettimeofday.  However, I couldn't correlate it to exactly what
> it was doing.
> When qmaster starts up a job, does it talk to each host, one by
> one, setting up the job information?  The scheduler actually picks
> the nodes used, correct?  If qmaster is talking to each node,
> is it done serially or are multiple requests sent out simultaneously?

Hi Craig,

with slave_control=true parallel jobs qmaster sends out messages
to all slave execution hosts in a loop in sge_give_job(). This
is required to ensure all slave execution daemons know about this
job and accept qrsh -inherit later on. The actual job isn't delivered
to master execution daemon before all slave execution hosts have
acknowledged asynchronuously they are prepared for the qrsh -inherit.

Upon arrival of job start order qmaster just sends out those messages
to slave execution daemons in sge_give_job() and the acknowledges from
execution daemons are handled separately. Any delay during delivery of
a large parallel job must occur in sge_give_job(). I reviewed that code,
but I found no obvious problem such as qmaster spooling the job each
time when a message is sent to an execution daemon.

Please SGE_DEBUG_LEVEL *and* SGE_ND to prevent qmaster from


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list