[GE users] nodes overloaded: processes placed on already full nodes

steve_s elcortogm at googlemail.com
Wed Dec 15 14:55:34 GMT 2010


On Dec 15 15:16 +0100, reuti wrote:
> > However, lately we observed the following. We have a bunch of 8-core
> > nodes connected by Infiniband and running MPI jobs across nodes. We found
> > that processed often get placed on full nodes which have 8 MPI processes
> > already running. This leaves us with many oversubscribed (load 16
> > instead of 8) nodes. This happens although there are many empty nodes
> > left in the queue. It is almost as if the slots already taken on one
> > node are ignored by SGE. 
> 
> how many slots are defined in the queue definition, and how many queues do you have defined?

    $ qconf -sql
    adde.q
    all.q
    test.q
    vtc.q

Only the first and last queue are used and only the first is used for
parallel jobs. Nodes belong to only one queue at a time such that jobs
in different queues cannot run on the same node. 


8 slots (see attachment for full output).

    $ qconf -sq adde.q | grep slot
    slots                 8

Thank you.

best,
Steve

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=305824

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

######################################################################
ISS has detected a compressed file attached to this message.
Please note that compressed files can be used to spread computer viruses.
If you were not expecting this file you should not open the attachment
even if you know that the sender is genuine.

ISS Helpdesk
helpdesk at leeds.ac.uk
+44 113 343 3333
######################################################################




    [ Part 2, "qconf-sq.txt.tgz"  Application/X-GTAR (Name: ]
    [ "qconf-sq.txt.tgz") 985 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list