[GE users] nodes overloaded: processes placed on already full nodes

templedf daniel.templeton at oracle.com
Wed Dec 15 15:13:12 GMT 2010

This is a known issue.  When scheduling parallel jobs with 6.2 to 6.2u5, 
the scheduler ignores host load.  This often results in jobs piling up 
on a few nodes while other nodes are idle.  The issue is fixed in 6.2u6 
(currently only available in product form).


On 12/15/10 06:55 AM, steve_s wrote:
> On Dec 15 15:16 +0100, reuti wrote:
>>> However, lately we observed the following. We have a bunch of 8-core
>>> nodes connected by Infiniband and running MPI jobs across nodes. We found
>>> that processed often get placed on full nodes which have 8 MPI processes
>>> already running. This leaves us with many oversubscribed (load 16
>>> instead of 8) nodes. This happens although there are many empty nodes
>>> left in the queue. It is almost as if the slots already taken on one
>>> node are ignored by SGE.
>> how many slots are defined in the queue definition, and how many queues do you have defined?
>      $ qconf -sql
>      adde.q
>      all.q
>      test.q
>      vtc.q
> Only the first and last queue are used and only the first is used for
> parallel jobs. Nodes belong to only one queue at a time such that jobs
> in different queues cannot run on the same node.
> 8 slots (see attachment for full output).
>      $ qconf -sq adde.q | grep slot
>      slots                 8
> Thank you.
> best,
> Steve
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=305824
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list