[GE users] 6.2 all queues dropped
dag at sonsorol.org
Fri Nov 14 14:18:43 GMT 2008
On Nov 14, 2008, at 9:07 AM, truneaux wrote:
> we have a 6.2 test installation here with clusterqueues for
> different applications like nastran_small and nastran_large with
> different wallclock limits.
> inside the q-configuration the hostlist is handled through
> hostgroups ... all nastran_XXX clusterqueues contain as hostlist
> only the hostgroup @NASTRAN.
> Today I came back after 4 weeks off and found one test job left
> which was never scheduled.
> qstat -j on this job gave me the scheduling info "All queues dropped
> because of overload or full".
This error makes sense, if some other user was consuming slots or the
load alarm thresholds had been reached on all nodes.
> qstat -f didn't list anything.
How about "qstat -f -u '*'" ?
By default 6.2 only shows qstat data for the user running the command,
try with "-u '*'" and you may see other jobs that are consuming slots
- perhaps someone else was active at the time your test job was
supposed to run?
> Restarting the whole cluster didn't change anything.
> I had to remove the hostgroup in the hostlist configuration entry of
> every single clusterqueue and put it back in again to get everything
> working again. Would there have been an easier way to do so? Or is
> this a problem of my configuration?
> Hoping for hints, best regards
> Christian Trunsperger
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users