[GE users] Infiniband loadsensor

erilon78se erik.lonroth at scania.com
Mon Aug 23 15:40:52 BST 2010


Ok, so if I dont do:

load_thresholds np_load_avg=1.75,[infiniband0=ibstate=false load_avg=1.75]

Then

The @infiniband0 hosts will not be disabled once the load>1.75 ?

/Erik

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de]
Sent: den 23 augusti 2010 16:16
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Infiniband loadsensor


Hi,

Am 23.08.2010 um 16:00 schrieb erilon78se:

> Hello Reuti!
>
> I would believe that the boolean variable would be referenced as a
> "true/false" thingy. I can understand the mapping 1/0, but would it be
> the same thing as for:
>
> qconf -sq all.q
> ...
> load_thresholds       np_load_avg=1.75,[@infiniband0=ibstate=false]
>
> It seems to accept both versions.

yes, you can use either of them.

N.B: the above statement will not set np_load_avg for the @infiniband0 hostgroup, but:

load_thresholds np_load_avg=1.75,[infiniband0=ibstate=false load_avg=1.75]

-- Reuti


> /Erik
>
>
>
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: den 23 augusti 2010 15:28
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Infiniband loadsensor
>
>
> Am 23.08.2010 um 15:25 schrieb reuti:
>
>> Hi Erik,
>>
>> Am 23.08.2010 um 15:02 schrieb erilon78se:
>>
>>> Does anyone know how to disable a node in SGE/OGE if a custom "load
>>> sensor" detects that the state of the link is bad.
>>>
>>> * For some nodes this is "Ok", since they dont have infiniband.
>>> * For some nodes this is "Not Ok" since they have infiniband.
>>>
>>> I have implemented the Load Sensor and added a (bool) complex to
>>> SGE, but how do I disable a node which reports an inactive
>>> infiniband link?
>>
>> it should be possible to put a node into alarm state, which is in the
>> end like disabling the node. An entry like:
>>
>> $ qconf -sq all.q
>> ...
>> load_thresholds       NONE,[@infiniband=0]
>
> Sorry, should read: NONE,[@infiniband=yourcomplex=0]
>
> -- Reuti
>
>
>> should do it. Depending on the logic you used, it might be necessary
>> to replace the 0 with 1. The faulty nodes should then show an "a" in
>> `qstat -f` in the column "states".
>>
>> -- Reuti
>>
>>
>>> /Erik
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMes
>>> s
>>> ageId=276248
>>>
>>> To unsubscribe from this discussion, e-mail:
>>> [users-unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMess
>> a
>> geId=276253
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
> geId=276256
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
> geId=276263
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276268

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276282

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list