[GE users] problem with job distributions

mad margaret_Doll at brown.edu
Tue Mar 10 19:35:45 GMT 2009


I have compute nodes each of which have eight processors.  I have  
assigned eight compute nodes to one of my queues.  The compute nodes  
are listed as groupa   which is on the hostlist of group-a queue.  In  
the General Configuration for group-a queue, I have slots listed as   
8.  When I look at the Cluster Queues, queue group-a has 64 total  
slots.  Currently 52 slots are shown as
being  used in qmon.

However,  when I execute  "qstat -f | grep group-a, I get

group-a at compute-0-0.local       BIP   8/8       8.08     lx26-amd64
group-a at compute-0-1.local       BIP   8/8       8.06     lx26-amd64
group-a at compute-0-10.local      BIP   8/8       11.12    lx26-amd64
group-a at compute-0-11.local      BIP   2/8       10.22    lx26-amd64
group-a at compute-0-12.local      BIP   6/8       7.16     lx26-amd64
group-a at compute-0-13.local      BIP   4/8       4.78     lx26-amd64
group-a at compute-0-2.local       BIP   8/8       8.12     lx26-amd64
group-a at compute-0-3.local       BIP   8/8       15.13    lx26-amd64    a

Total number of slots being used is 52 which agrees with qmon.
However the load shows 59  jobs.

If I ssh  into compute-0-3, I see 15 jobs  being used by one user.
All jobs except one is using 50% of a CPU.

My users say they are using variations of

qsub -pe queue-a 20 scriptp


Why would the distibution of jobs be so out of whack?  I have been
running this cluster with this version of the system for about six
months  now.   The only time the distribution was not even
occurred before one of my users learned to use qsub properly.



Running ROCKS 5.3 with Redhat 2.6.18-53.1.14.el5

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=126858

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list