[GE users] 6.2 all queues dropped

craffi dag at sonsorol.org
Fri Nov 14 14:18:43 GMT 2008


On Nov 14, 2008, at 9:07 AM, truneaux wrote:

> Hello,
>
> we have a 6.2 test installation here with clusterqueues for  
> different applications like nastran_small and nastran_large with  
> different wallclock limits.
>
> inside the q-configuration the hostlist is handled through  
> hostgroups ... all nastran_XXX clusterqueues contain as hostlist  
> only the hostgroup @NASTRAN.
>
> Today I came back after 4 weeks off and found one test job left  
> which was never scheduled.
>
> qstat -j on this job gave me the scheduling info "All queues dropped  
> because of overload or full".

This error makes sense, if some other user was consuming slots or the  
load alarm thresholds had been reached on all nodes.

>
> qstat -f didn't list anything.


How about "qstat -f -u '*'" ?

By default 6.2 only shows qstat data for the user running the command,  
try with "-u '*'" and you may see other jobs that are consuming slots  
- perhaps someone else was active at the time your test job was  
supposed to run?

>
>
> Restarting the whole cluster didn't change anything.
>
> I had to remove the hostgroup in the hostlist configuration entry of  
> every single clusterqueue and put it back in again to get everything  
> working again. Would there have been an easier way to do so? Or is  
> this a problem of my configuration?
>
> Hoping for hints, best regards
>
> Christian Trunsperger

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88761

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list