[GE users] Trouble with load thresholds

opoplawski orion at cora.nwra.com
Mon Mar 8 18:44:44 GMT 2010


Using gridengine 6.2u5.  I've got a couple machines in our grid that 
have lot of interactive use so I limit grid access with a load threshold 
of np_load_avg = 1, a suspend threshold of 1.3 or 1.02 with load 
adjustment for np_load_avg of 1.

However, my 8 core machines are getting woefully underused.

Two different cases:

hobbes, suspend threshold of 1.3.  top shows load average has been 
around 3.7-4.3.  I generally only see one or two jobs at a time ever get 
run one it.  qstat -j shows:

                             queue instance "all.q at hobbes.cora.nwra.com" 
dropped because it is overloaded: np_load_avg=1.003750 (= 0.541250 + 1.0 
* 3.700000 with nproc=8) >= 1

I would have expected about 3-4 jobs on it.  I can't make any sense of 
what the above line is supposed to be telling me.


josiah, suspend threshold of 1.02.  steady load average about 3.3.

got 3 jobs on it, but qstat alternates with:

                             queue instance 
"compute.q at josiah.cora.nwra.com" dropped because it is overloaded: 
np_load_avg=1.016250 (= 0.425000 + 1.0 * 4.730000 with nproc=8) >= 1

and

                             queue instance 
"compute.q at josiah.cora.nwra.com" is in suspend alarm: 
np_load_avg=1.026250 (= 0.425000 + 1.0 * 4.810000 with nproc=8) >= 1.02


Some thoughts -

- These are very short jobs, just a few seconds of cpu time, must be 
playing havoc with load adjustments?  Does load adjustment get removed 
when a job ends?

- Why are load adjustments used to suspend jobs?  I think that should 
only use the actual load of the machine.



-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion at cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=247546

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list