[GE users] Jobs waiting due to loss of ressources

templedf dan.templeton at sun.com
Mon Dec 28 19:02:47 GMT 2009


"Load value" just means it's a value that was reported from a load 
sensor.  I agree that it sounds like you have a load sensor that's 
misbehaving.  The load sensor for a global value would not be set in the 
global host config (qconf -sconf).  It would instead be set for one 
specific host.  (Yes, it sounds illogical, but it actually makes sense 
if you think it through.)  Check the "qconf -sconf <host>" output for 
all your machines, e.g.

for host in `qconf -sel`; do
  echo $host
  qconf -sconf host | grep load_sensor
done

And when you find it, write yourself a note so that you don't have to go 
looking for it again in the future. :)

Daniel

fanou wrote:
> Hi,
>
>   
>>> For 2 days, my jobs in the queue are not launched anymore.
>>> If I 'qstat' pending jobs, I get the following sheduling_info :
>>> queue instance "all.q at master.cluster" dropped because it is full
>>>                             (-l fluentall=1) cannot run globally  
>>> because it offers only gl:fluentall=0.000000
>>>       
>> gl: means it's a load value. Is the process which returns this load  
>> still running (`ps` or alike) (defined in `qconf -sconf` entry  
>> "load_sensor")?
>>     
>
> "load_sensor" is set to none in `qconf -sconf`.
> I am not sure it has been defined to something else before.
>
> I am surprised this resource "fluentall" is defined load as it should represent a software license.
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235302

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list