[GE users] problem with job distributions
margaret_Doll at brown.edu
Tue Mar 10 19:35:45 GMT 2009
I have compute nodes each of which have eight processors. I have
assigned eight compute nodes to one of my queues. The compute nodes
are listed as groupa which is on the hostlist of group-a queue. In
the General Configuration for group-a queue, I have slots listed as
8. When I look at the Cluster Queues, queue group-a has 64 total
slots. Currently 52 slots are shown as
being used in qmon.
However, when I execute "qstat -f | grep group-a, I get
group-a at compute-0-0.local BIP 8/8 8.08 lx26-amd64
group-a at compute-0-1.local BIP 8/8 8.06 lx26-amd64
group-a at compute-0-10.local BIP 8/8 11.12 lx26-amd64
group-a at compute-0-11.local BIP 2/8 10.22 lx26-amd64
group-a at compute-0-12.local BIP 6/8 7.16 lx26-amd64
group-a at compute-0-13.local BIP 4/8 4.78 lx26-amd64
group-a at compute-0-2.local BIP 8/8 8.12 lx26-amd64
group-a at compute-0-3.local BIP 8/8 15.13 lx26-amd64 a
Total number of slots being used is 52 which agrees with qmon.
However the load shows 59 jobs.
If I ssh into compute-0-3, I see 15 jobs being used by one user.
All jobs except one is using 50% of a CPU.
My users say they are using variations of
qsub -pe queue-a 20 scriptp
Why would the distibution of jobs be so out of whack? I have been
running this cluster with this version of the system for about six
months now. The only time the distribution was not even
occurred before one of my users learned to use qsub properly.
Running ROCKS 5.3 with Redhat 2.6.18-53.1.14.el5
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users