[GE users] Infiniband loadsensor

reuti reuti at staff.uni-marburg.de
Mon Aug 23 15:16:16 BST 2010


Hi,

Am 23.08.2010 um 16:00 schrieb erilon78se:

> Hello Reuti!
> 
> I would believe that the boolean variable would be referenced as a "true/false" thingy. I can understand the mapping 1/0, but would it be the same thing as for:
> 
> qconf -sq all.q
> ...
> load_thresholds       np_load_avg=1.75,[@infiniband0=ibstate=false]
> 
> It seems to accept both versions.

yes, you can use either of them.

N.B: the above statement will not set np_load_avg for the @infiniband0 hostgroup, but:

load_thresholds np_load_avg=1.75,[infiniband0=ibstate=false load_avg=1.75]

-- Reuti


> /Erik
> 
> 
> 
> 
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: den 23 augusti 2010 15:28
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Infiniband loadsensor
> 
> 
> Am 23.08.2010 um 15:25 schrieb reuti:
> 
>> Hi Erik,
>> 
>> Am 23.08.2010 um 15:02 schrieb erilon78se:
>> 
>>> Does anyone know how to disable a node in SGE/OGE if a custom "load
>>> sensor" detects that the state of the link is bad.
>>> 
>>> * For some nodes this is "Ok", since they dont have infiniband.
>>> * For some nodes this is "Not Ok" since they have infiniband.
>>> 
>>> I have implemented the Load Sensor and added a (bool) complex to SGE,
>>> but how do I disable a node which reports an inactive infiniband
>>> link?
>> 
>> it should be possible to put a node into alarm state, which is in the
>> end like disabling the node. An entry like:
>> 
>> $ qconf -sq all.q
>> ...
>> load_thresholds       NONE,[@infiniband=0]
> 
> Sorry, should read: NONE,[@infiniband=yourcomplex=0]
> 
> -- Reuti
> 
> 
>> should do it. Depending on the logic you used, it might be necessary
>> to replace the 0 with 1. The faulty nodes should then show an "a" in
>> `qstat -f` in the column "states".
>> 
>> -- Reuti
>> 
>> 
>>> /Erik
>>> 
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMess
>>> ageId=276248
>>> 
>>> To unsubscribe from this discussion, e-mail:
>>> [users-unsubscribe at gridengine.sunsource.net].
>>> 
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
>> geId=276253
>> 
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>> 
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276256
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276263
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276268

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list