[GE users] nodes overloaded: processes placed on already full nodes
reuti at staff.uni-marburg.de
Fri Dec 17 13:16:49 GMT 2010
Am 15.12.2010 um 17:23 schrieb steve_s:
> On Dec 15 16:28 +0100, reuti wrote:
>> Am 15.12.2010 um 16:13 schrieb templedf:
>>> This is a known issue. When scheduling parallel jobs with 6.2 to 6.2u5,
>>> the scheduler ignores host load.
>>> This often results in jobs piling up
>>> on a few nodes while other nodes are idle.
> OK, good to know. We're running 6.2u3 here.
> I'm not sure if I get this right: Even if the load is ignored, doesn't
> SGE keep track of already given-away slots on each node? I always
> thought that this is the way jobs are scheduled in the first place
> (besides policies and all that, but that should have nothing to do with
> load or slots in this context).
> Given that SGE knows i.e. np_load_avg on each node, I thought we could
> circumvent the problem by setting np_load_avg to requestable=YES and
> then something like
> $ qsub -hard -l 'np_load_avg < 0.3' ...
You can only specify a value, the relation is defined already in the complex definition.
> but this gives me
> "Unable to run job: denied: missing value for request "np_load_avg".
> whereas using "=" or ">" works. I guess the reason is what is stated in
When > is working, it's a bug. I get: Unable to run job: unknown resource "fubar>12". (same for <, maybe it was fixed in 6.2u5).
> ">=, >, <=, < operators can only be overridden, when the new value
> is more restrictive than the old one."
> So, I cannot use "<". If that is the case, what can we do about it? Do
> we need to define a new complex attribute (say 'np_load_avg_less') along
> with a load_sensor or can we hijack np_load_avg in another way?
>> As far as I understood the problem, the nodes are oversubscribed by getting more than 8 processes scheduled.
So, we now what to deal with.
>> Did you change the host assignment to certain queues, while jobs were still running? Maybe you need to limit the number total slots per machine to 8 in an RQS or setting it for each host's complex_values.
> No, we didn't change the host assignment.
> Sorry, but what do you mean by RQS? Did not see that in the
> documentation so far.
When you have more than one queue on a maschine, all slots might get used and thus oversubscribing the machine. Hence the total number of used slots across all queues at a time on each machine must be limited. When you have only one queue per machine, then this can't happen though.
>> Another reason for virtual oversubscription: processes in state "D" count as running and dispite the fact of the high load, all is in best order.
> Oversubscribed nodes do not always run 16 instead of 8 processes, some
> only 14 or so. Nevertheless, the load is always almost exactly 16. As
> far as I can see, processes on these oversubscribed nodes (with > 8
> processes) run with ~50% CPU load each.
ps -e f
(f w/o -) show on such a node? Are all the processes bound to an sge_shepherd, or did some jump out of the processes tree and weren't killed?
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users