[GE users] Advanced reservation for cluster outage?

reuti reuti at staff.uni-marburg.de
Tue Jan 19 21:25:35 GMT 2010


Am 19.01.2010 um 17:21 schrieb s_kreidl:

> Hi Reuti,
>
> thanks for the quick reply. Yes, of course, qrstat is indeed the
> standard way of getting information about ARs.
>
> However, I find it a rather long way to go for a user, to look for
> ongoing advanced reservations because of a pending job, when there are
> no hints in the "qstat -j" messages and also no hints from any other
> qstat request. (And to be honest, I'm rather reluctant to write  
> another
> piece of documentation for the rare occasions of cluster outages for
> which we (mis-?)use the AR feature  ;-) ).

No, it's an intended use IMO.


> Don't you think some kind of RFE would be appropriate?

There is already an RFE which you could extend:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=224

It's also the case that sometimes you see only that the PE offers  
only 0 slots - but it's not easy to get the cause of this sometimes.  
A qstat redesign (or better: its scheduler output) would be an  
improvement.

-- Reuti


>
> Best,
> Sabine
>
> reuti schrieb:
>> Hi,
>>
>> Am 19.01.2010 um 16:57 schrieb s_kreidl:
>>
>>
>>> I somehow got the AR working as expected with SGE 6.2u3 (qrsub -a
>>> 01291200 -e 01291800 -pe "openmpi-8perhost" 1008 -q "*@*" -u  
>>> my_user)
>>>
>>> The problem I encounter now, is that users have a hard time to get
>>> to know anything about the existing AR:
>>>
>>> 1. "qhost -q" shows the reserved slots for one of the two queues
>>> (par.q) we have, but shows nothing for the other queue (all.q -
>>> historic reasons), for which the reservation obviously does have
>>> the desired consequences too.
>>>
>>> 2. "qstat -j" gives no hint on any ongoing reservation for parallel
>>> pending jobs (only jobs explicitly sent to the "non-reserved" queue
>>> all.q do show "cannot run at host [...] due to a reservation"
>>> messages)
>>>
>>> 3. "qstat -f" shows no reservation in the triple slot display of
>>> any queue instance
>>>
>>> 4. "qstat -g c" shows no reservation at all
>>>
>>
>> does:
>>
>> $ qrstat -u "*"
>>
>> (note the r in qstat) help?
>>
>> -- Reuti
>>
>>
>>> I do have two questions/concerns now:
>>>
>>> 1. Am I missing some standard procedure making ARs visible to the
>>> user as a reason for their pending jobs - is an update to 6.2u5
>>> necessary?
>>>
>>> 2. If not, I'd like to make an RFE of some kind, but as I
>>> understand too little about the internal workings of SGE and AR,
>>> I'd like to put this to discussion.
>>>
>>>
>>> Any thoughts would be much appreciated.
>>> Thanks,
>>> Sabine
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=239747
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=239748
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=239754
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=239798

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list