[GE users] Jobs waiting due to loss of ressources
dan.templeton at sun.com
Mon Dec 28 19:02:47 GMT 2009
"Load value" just means it's a value that was reported from a load
sensor. I agree that it sounds like you have a load sensor that's
misbehaving. The load sensor for a global value would not be set in the
global host config (qconf -sconf). It would instead be set for one
specific host. (Yes, it sounds illogical, but it actually makes sense
if you think it through.) Check the "qconf -sconf <host>" output for
all your machines, e.g.
for host in `qconf -sel`; do
qconf -sconf host | grep load_sensor
And when you find it, write yourself a note so that you don't have to go
looking for it again in the future. :)
>>> For 2 days, my jobs in the queue are not launched anymore.
>>> If I 'qstat' pending jobs, I get the following sheduling_info :
>>> queue instance "all.q at master.cluster" dropped because it is full
>>> (-l fluentall=1) cannot run globally
>>> because it offers only gl:fluentall=0.000000
>> gl: means it's a load value. Is the process which returns this load
>> still running (`ps` or alike) (defined in `qconf -sconf` entry
> "load_sensor" is set to none in `qconf -sconf`.
> I am not sure it has been defined to something else before.
> I am surprised this resource "fluentall" is defined load as it should represent a software license.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users