[GE users] Advanced reservation for cluster outage?

s_kreidl sabine.kreidl at uibk.ac.at
Wed Jan 20 12:26:18 GMT 2010


Good to hear that handling cluster outages is an intended use of AR.
And thanks for the hint on the existing RFE. I will consider adding to 
that one, as soon as I am clear about what I'd actually want from "qstat 
-j". However, I can absolutely confirm the unhelpful "PE offers only 0 
slots" messages in my situation.

Regards,
Sabine

reuti schrieb:
> Am 19.01.2010 um 17:21 schrieb s_kreidl:
>
>   
>> Hi Reuti,
>>
>> thanks for the quick reply. Yes, of course, qrstat is indeed the
>> standard way of getting information about ARs.
>>
>> However, I find it a rather long way to go for a user, to look for
>> ongoing advanced reservations because of a pending job, when there are
>> no hints in the "qstat -j" messages and also no hints from any other
>> qstat request. (And to be honest, I'm rather reluctant to write  
>> another
>> piece of documentation for the rare occasions of cluster outages for
>> which we (mis-?)use the AR feature  ;-) ).
>>     
>
> No, it's an intended use IMO.
>
>
>   
>> Don't you think some kind of RFE would be appropriate?
>>     
>
> There is already an RFE which you could extend:
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=224
>
> It's also the case that sometimes you see only that the PE offers  
> only 0 slots - but it's not easy to get the cause of this sometimes.  
> A qstat redesign (or better: its scheduler output) would be an  
> improvement.
>
> -- Reuti
>
>
>   
>> Best,
>> Sabine
>>
>> reuti schrieb:
>>     
>>> Hi,
>>>
>>> Am 19.01.2010 um 16:57 schrieb s_kreidl:
>>>
>>>
>>>       
>>>> I somehow got the AR working as expected with SGE 6.2u3 (qrsub -a
>>>> 01291200 -e 01291800 -pe "openmpi-8perhost" 1008 -q "*@*" -u  
>>>> my_user)
>>>>
>>>> The problem I encounter now, is that users have a hard time to get
>>>> to know anything about the existing AR:
>>>>
>>>> 1. "qhost -q" shows the reserved slots for one of the two queues
>>>> (par.q) we have, but shows nothing for the other queue (all.q -
>>>> historic reasons), for which the reservation obviously does have
>>>> the desired consequences too.
>>>>
>>>> 2. "qstat -j" gives no hint on any ongoing reservation for parallel
>>>> pending jobs (only jobs explicitly sent to the "non-reserved" queue
>>>> all.q do show "cannot run at host [...] due to a reservation"
>>>> messages)
>>>>
>>>> 3. "qstat -f" shows no reservation in the triple slot display of
>>>> any queue instance
>>>>
>>>> 4. "qstat -g c" shows no reservation at all
>>>>
>>>>         
>>> does:
>>>
>>> $ qrstat -u "*"
>>>
>>> (note the r in qstat) help?
>>>
>>> -- Reuti
>>>
>>>
>>>       
>>>> I do have two questions/concerns now:
>>>>
>>>> 1. Am I missing some standard procedure making ARs visible to the
>>>> user as a reason for their pending jobs - is an update to 6.2u5
>>>> necessary?
>>>>
>>>> 2. If not, I'd like to make an RFE of some kind, but as I
>>>> understand too little about the internal workings of SGE and AR,
>>>> I'd like to put this to discussion.
>>>>
>>>>
>>>> Any thoughts would be much appreciated.
>>>> Thanks,
>>>> Sabine
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=239747
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>>         
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=239748
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>>
>>>       
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=239754
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=239798
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=239929

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list