[GE users] cannot run on host until clean up of an previous run has finished

prentice prentice at ias.edu
Wed Feb 24 14:08:39 GMT 2010


This problem bas been going on much longer than 5 minutes. Is there a
way to clear this "error"? No error is shown for the queue instance, but
jobs aren't running.

templedf wrote:
> The "cleanup" really just an excuse.  When a job fails on a host, 
> there's a timeout (5 minutes, I think) before it's allowed to try 
> running on that host again.
> 
> Daniel
> 
> On 02/24/10 05:54, prentice wrote:
>> Dear GU Users,
>>
>> A couple of weeks ago, that big snowstorm that hit the mid-atlantic took
>> out the power to my server room, causing the cluster to go down very
>> ungracefully.
>>
>> Now, a large job can't run because SGE says there's not enough slots for
>> the PE. When I do qstat -j<jobid>, I get a lot of messages like this:
>>
>> cannot run on host "node24.aurora" until clean up of an previous run has
>> finished
>>
>> I'm sure this is leftover from the ungraceful shutdown of SGE. What is
>> the best way to "clean up" these previous runs?
>>
>>
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=245864
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 

-- 
Prentice Bisbal
Linux Software Support Specialist/System Administrator
School of Natural Sciences
Institute for Advanced Study
Princeton, NJ

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=245866

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list