[GE users] Infiniband loadsensor

reuti reuti at staff.uni-marburg.de
Mon Aug 23 15:49:38 BST 2010


Am 23.08.2010 um 16:40 schrieb erilon78se:

> Ok, so if I dont do:
> 
> load_thresholds np_load_avg=1.75,[infiniband0=ibstate=false load_avg=1.75]
> 
> Then
> 
> The @infiniband0 hosts will not be disabled once the load>1.75 ?

Exactly. The default (the one w/o exechost/hostgroup specification) will be overridden by exechost/hostgroup specific ones. It's mentioned near the top of `man queue_conf` in more detail.

-- Reuti


> 
> /Erik
> 
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: den 23 augusti 2010 16:16
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Infiniband loadsensor
> 
> 
> Hi,
> 
> Am 23.08.2010 um 16:00 schrieb erilon78se:
> 
>> Hello Reuti!
>> 
>> I would believe that the boolean variable would be referenced as a
>> "true/false" thingy. I can understand the mapping 1/0, but would it be
>> the same thing as for:
>> 
>> qconf -sq all.q
>> ...
>> load_thresholds       np_load_avg=1.75,[@infiniband0=ibstate=false]
>> 
>> It seems to accept both versions.
> 
> yes, you can use either of them.
> 
> N.B: the above statement will not set np_load_avg for the @infiniband0 hostgroup, but:
> 
> load_thresholds np_load_avg=1.75,[infiniband0=ibstate=false load_avg=1.75]
> 
> -- Reuti
> 
> 
>> /Erik
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: den 23 augusti 2010 15:28
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Infiniband loadsensor
>> 
>> 
>> Am 23.08.2010 um 15:25 schrieb reuti:
>> 
>>> Hi Erik,
>>> 
>>> Am 23.08.2010 um 15:02 schrieb erilon78se:
>>> 
>>>> Does anyone know how to disable a node in SGE/OGE if a custom "load
>>>> sensor" detects that the state of the link is bad.
>>>> 
>>>> * For some nodes this is "Ok", since they dont have infiniband.
>>>> * For some nodes this is "Not Ok" since they have infiniband.
>>>> 
>>>> I have implemented the Load Sensor and added a (bool) complex to
>>>> SGE, but how do I disable a node which reports an inactive
>>>> infiniband link?
>>> 
>>> it should be possible to put a node into alarm state, which is in the
>>> end like disabling the node. An entry like:
>>> 
>>> $ qconf -sq all.q
>>> ...
>>> load_thresholds       NONE,[@infiniband=0]
>> 
>> Sorry, should read: NONE,[@infiniband=yourcomplex=0]
>> 
>> -- Reuti
>> 
>> 
>>> should do it. Depending on the logic you used, it might be necessary
>>> to replace the 0 with 1. The faulty nodes should then show an "a" in
>>> `qstat -f` in the column "states".
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> /Erik
>>>> 
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMes
>>>> s
>>>> ageId=276248
>>>> 
>>>> To unsubscribe from this discussion, e-mail:
>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>> 
>>> 
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMess
>>> a
>>> geId=276253
>>> 
>>> To unsubscribe from this discussion, e-mail:
>>> [users-unsubscribe at gridengine.sunsource.net].
>>> 
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
>> geId=276256
>> 
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
>> geId=276263
>> 
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>> 
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276268
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276282
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276284

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list