[GE users] Long delay when submitting large jobs

Sean Dilda agrajag at dragaera.net
Fri Jan 14 18:51:05 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Craig Tierney wrote:
> I have been running SGE6.0u1 for a few months now on a new system.
> I have noticed very long delays, or even SGE hangs, when starting
> large jobs.  I just tried this on the latest CVS source and
> the problem persists.
> 
> It appears that the hang while the job is moved from 'qw' to t.
> In general the system does continue to operate normally.  However
> the delays can be large, 30-60 seconds.  'Hang' is defined as
> system commands like qsub and qstat will delay until the job
> has finished migrating to the 't' status.  Sometimes the delays
> are long enough to get GDI failures.  Since qmaster is threaded,
> I wonder why I get the hangs.

I've seen similar things.  It's even worse if you try to qdel a large 
job like that.   Like you, I have an entirely linux-based cluster and am 
using classic spooling over NFS.   At this point, my best guess is that 
its slow downs with the spooling module combined with sub-optimal use of 
threads.  However, I haven't gotten much beyond that.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list