[GE users] cannot run on host until clean up of an previous run has finished

templedf dan.templeton at sun.com
Wed Feb 24 14:05:18 GMT 2010


The "cleanup" really just an excuse.  When a job fails on a host, 
there's a timeout (5 minutes, I think) before it's allowed to try 
running on that host again.

Daniel

On 02/24/10 05:54, prentice wrote:
> Dear GU Users,
>
> A couple of weeks ago, that big snowstorm that hit the mid-atlantic took
> out the power to my server room, causing the cluster to go down very
> ungracefully.
>
> Now, a large job can't run because SGE says there's not enough slots for
> the PE. When I do qstat -j<jobid>, I get a lot of messages like this:
>
> cannot run on host "node24.aurora" until clean up of an previous run has
> finished
>
> I'm sure this is leftover from the ungraceful shutdown of SGE. What is
> the best way to "clean up" these previous runs?
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=245864

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list