[GE users] slow responses when large jobs finish

Sean Dilda agrajag at dragaera.net
Thu Sep 16 22:22:08 BST 2004


Has anyone else noticed slow responses from SGE commands when large jobs
are finishing?  I had a user just delete about 4 running jobs, each one
was taking up 30 slots (so 120 slots total).  For a few minutes
afterwords, sge_qmaster was effectively unresponsive.  Commands like
'qstat' would just sit there until sge_qmaster becomes responsive
again.  I've noticed this kinda of behavior before, but it was
especially bad this time.  Has anyone else noticed anything of this
sort?

I'm running 6.0u1 with classic spooling (and sge_qmaster's spool is over
nfs).   I ran strace and it seemed that one of the sge_qmaster threads
was busy doing a lot of file I/O related to the jobs that were
finishing.  This surprises me somewhat as I thought making sge_qmaster
threaded was supposed to help with situations like this.  I understand
that using NFS will slow things down somewhat, but I can't imagine that
120 slots worth of jobs would cause enough file I/O that sge_qmaster
would become effectively unresponsive for several minutes.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list