[GE users] Scheduler Problems ... HELP!

murphygb brian.murphy at siemens.com
Tue Jul 13 19:01:34 BST 2010


> Hi,
> 
> Am 13.07.2010 um 16:08 schrieb murphygb:
> 
> > I have job that seems to be stuck in the scheduler and all jobs that get submitted are pending.  From my 'messages' file I have a ton of these:
> > 
> > 07/13/2010 10:03:10|schedu|usorl03p430|E|scheduler tries to schedule job 51112.1 twice
> > 07/13/2010 10:03:25|worker|usorl03p430|E|scheduler tries to schedule job 51112.1 twice
> > 07/13/2010 10:03:25|worker|usorl03p430|W|Skipping remaining 8 orders
> > 07/13/2010 10:03:25|schedu|usorl03p430|E|scheduler tries to schedule job 51112.1 twice
> > 07/13/2010 10:03:40|worker|usorl03p430|E|scheduler tries to schedule job 51112.1 twice
> > 07/13/2010 10:03:40|worker|usorl03p430|W|Skipping remaining 8 orders
> > 07/13/2010 10:03:40|schedu|usorl03p430|E|scheduler tries to schedule job 51112.1 twice
> > 07/13/2010 10:03:55|worker|usorl03p430|E|scheduler tries to schedule job 51112.1 twice
> > 07/13/2010 10:03:55|worker|usorl03p430|W|Skipping remaining 8 orders
> > 07/13/2010 10:03:55|schedu|usorl03p430|E|scheduler tries to schedule job 51112.1 twice
> > 
> > If I try and qdel this job I get the message that the job is already in deletion.  What can I do?  We have rebooted the master but that did not help.  6.2u5
> 
> is the job still somewhere hanging around on a node?
> 
> -- Reuti
> 
> 
Well, I was able to get it cleared by deleting that job from the jobs directory on the master and then restarting the master (thanks Sinisa) but I think it was a byproduct of a bigger issue.  The user is submitting 730 jobs all at once and the scheduler is getting overwhelmed.  As soon as this happens, those jobs and all others submitted after them pend and all qstat -j commands give job info and no scheduler info and say:

Can not get job info messages, scheduler is not available

Master has 32GB mem and 4 processors.  The master process is using 17 and has one cpu pegged at 100%.  Any tweaks I can do to resolve this?  Linux RHEL 5.4 SGE 6.2u5
> > ------------------------------------------------------
> > http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=267782
> > 
> > To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=267811

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list