[GE users] Node exclusion by users
reuti at staff.uni-marburg.de
Tue Aug 8 21:34:15 BST 2006
Am 08.08.2006 um 22:05 schrieb Iwona Sakrejda:
> Reuti wrote:
>> Am 07.08.2006 um 18:44 schrieb Iwona Sakrejda:
>>> Is it possible for a user to stop his jobs from running
>>> on a few nodes? Sometimes nodes break down in a way that
>>> is not easy to detect by monitoring. And if jobs flush through
>>> a node it's not always a problem with the node, most often
>>> a poorly constructed job. I would like to give users an
>>> option during weekends or evenings when support is not
>>> staffed to declare - "I don't want to run on the following nodes".
>> the user can specify all the queue instances with -q
>> serial at node01,serial at node02
> with ~300 nodes available and only one or two to exclude,
> that might be tedious, but doable. Is there a limit on the length
> of that argument?
>> or submit to a complete queue domain, if you setup hostgroups
>> before for some of the machines,
> so can I do:
> serial at myhostgroup1,serial at myhostgroup2,serial at node08,serial at node09
> I mean is mixing host groups and hosts allowed?
Yes, this should work. You could even use a wildcard *, but remember
that hostgroups have a @ name in their name, so it would read
>> which are available at the weekend.
> This is part of the problem. I do not want to exculde any
> nodes a priori. I only want users to have a chance
> to avoid nodes they think are bad for them....
Aha, now I got your intention. Is there at least one reasonable
person available at weekends? This could be the owner of the queue,
and be granted to disable the queue instances on the nodes in question.
We had some black holes in the cluster by a filled /tmp. There we use
now the load-sensor from http://gridengine.sunsource.net/howto/
loadsensor.html and setup a load_threshold to close the queue, if
less than 1GB is free to avoid it.
HTH - Reuti
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users