[GE users] nodes overloaded: processes placed on already full nodes

steve_s elcortogm at googlemail.com
Wed Dec 15 13:58:14 GMT 2010


We're using SGE for a while now and are quite happy with it. 

However, lately we observed the following. We have a bunch of 8-core
nodes connected by Infiniband and running MPI jobs across nodes. We found
that processed often get placed on full nodes which have 8 MPI processes
already running. This leaves us with many oversubscribed (load 16
instead of 8) nodes. This happens although there are many empty nodes
left in the queue. It is almost as if the slots already taken on one
node are ignored by SGE. 

This is seen with OpenMPI and Intel MPI and with different applications.
No applications does threading or anything that would create more
processes than requested slots.

Did anybody have similar observations? We are thankful for any hints on
how to debug this.



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list