[GE users] slow responses when large jobs finish

Bernard Li bli at bcgsc.ca
Fri Sep 17 01:11:37 BST 2004


Hi Sean:

I have noticed that if there are a huge number of jobs exiting from SGE,
sge_commd will get hung and there is no way for it to recover.

When users submit a huge number of array jobs (for example 30,000 tasks)
and the jobs either finish really quickly or they are deleted in one
shot, then bad things happen.

We are using 5.3p6 and have stopped using array jobs ever since
(sticking with just simple jobs).

We have local installation of SGE on each node.

Cheers,

Bernard 

> -----Original Message-----
> From: Sean Dilda [mailto:agrajag at dragaera.net] 
> Sent: Thursday, September 16, 2004 14:22
> To: users at gridengine.sunsource.net
> Subject: [GE users] slow responses when large jobs finish
> 
> Has anyone else noticed slow responses from SGE commands when 
> large jobs are finishing?  I had a user just delete about 4 
> running jobs, each one was taking up 30 slots (so 120 slots 
> total).  For a few minutes afterwords, sge_qmaster was 
> effectively unresponsive.  Commands like 'qstat' would just 
> sit there until sge_qmaster becomes responsive again.  I've 
> noticed this kinda of behavior before, but it was especially 
> bad this time.  Has anyone else noticed anything of this sort?
> 
> I'm running 6.0u1 with classic spooling (and sge_qmaster's 
> spool is over
> nfs).   I ran strace and it seemed that one of the sge_qmaster threads
> was busy doing a lot of file I/O related to the jobs that 
> were finishing.  This surprises me somewhat as I thought 
> making sge_qmaster threaded was supposed to help with 
> situations like this.  I understand that using NFS will slow 
> things down somewhat, but I can't imagine that 120 slots 
> worth of jobs would cause enough file I/O that sge_qmaster 
> would become effectively unresponsive for several minutes.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list