[GE users] 6.2 all queues dropped

admin at sunsource.net admin at sunsource.net
Mon Nov 17 09:28:30 GMT 2008


> > Today I came back after 4 weeks off and found one test job left which
> > was never scheduled.
> > 
> > qstat -j on this job gave me the scheduling info "All queues dropped
> > because of overload or full".
> > qstat -f didn't list anything.
> > 
> > Restarting the whole cluster didn't change anything.
> > 
> > I had to remove the hostgroup in the hostlist configuration entry of
> > every single clusterqueue and put it back in again to get everything
> > working again. Would there have been an easier way to do so? Or is
> > this a problem of my configuration?
> 
> Did you check (and clear) the error states on the various queues?
> If one of the many other jobs hit something nasty (eg, failed licence
> check in prolog), it can set the queue into an 'E' error state and thus
> prevent anything from getting scheduled.
> 
> /mark
> This e-mail message and any attachments may contain 
> legally privileged, confidential or proprietary Information, 
> or information otherwise protected by law of EMCON 
> Technologies, its affiliates, or third parties. This notice 
> serves as marking of its "Confidential" status as defined 
> in any confidentiality agreements concerning the sender 
> and recipient. If you are not the intended recipient(s), 
> or the employee or agent responsible for delivery of this 
> message to the intended recipient(s), you are hereby 
> notified that any dissemination, distribution or copying 
> of this e-mail message is strictly prohibited. 
> If you have received this message in error, please 
> immediately notify the sender and delete this e-mail 
> message from your computer.

hello mark,
thanks for your suggestions. I checked all queues, they were without error and there was only one single job queued. Only I found strange messages in the qmasters logs:

no event client known with id 1 to process acknowledgements
no event client known with id 1 to modify

and there were loads of these messages. Maybe the scheduler died due to the really bad network setup ...

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88857

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list