[GE users] Jobs waiting due to loss of ressources

reuti reuti at staff.uni-marburg.de
Wed Dec 30 14:59:01 GMT 2009


Am 30.12.2009 um 11:11 schrieb fanou:

> Hi,
>> True !
>> qconf -sconf master.cluster reports a load sensor.
>> This script file uses some environment variables whom disappeared...
> It was a wrong conclusion. Environment variables are known by the  
> load sensor.
> It was only set for the master. I added it to all nodes (all  
> servers in `qconf -sel`). It seems to work fine now. :-)
> But I still don't understand why the configurations has been lost.  
> I have some questions :
> - Should the load sensor be set on all exec servers ?

as it's a global load sensor, it's sufficient when it runs on just  
one machine.

> - If I suppose it was set before, is it possible there would be a  
> bug in SGE 6.1 to loose this setting ?

I don't think so.

Maybe the person who started the sgeexecd in former times just did a  
`su` and not `su -`, hence the enviroment from his original session  
was visible in the root session. Or did `sudo` with a particular  
setting in /etc/sudoers. Or ssh to the node while keeping the variable.

Best would be of course to have the load sensor script self-contained  
and not to rely on any preset variable. This rule I also recommend  
for any job script to my users.

-- Reuti

> -- 
> Fanou
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=235577
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list