[GE users] Advanced Reservation question

reuti reuti at staff.uni-marburg.de
Fri Mar 12 21:23:13 GMT 2010


Am 12.03.2010 um 20:48 schrieb mhanby:

> This happens with both users, so apparently the user thing wasn't an
> issue.
>
> It looks like the cutoff happens between 09:09:59 and 09:10:00
>
> $ echo `/bin/hostname` | qsub -ar 15 -pe lam_loose_rsh 32 -l
> h_rt=09:09:59
> Your job 111005 ("STDIN") has been submitted
>
> $ echo `/bin/hostname` | qsub -ar 15 -pe lam_loose_rsh 32 -l
> h_rt=09:10:59
> Unable to run job: error: no suitable queues.
> Exiting.
>
> I also tried using "h_rt=32999" and "h_rt=33000" with the same
> results.

Yep, I must confirm this. But for me the limit is 9:09:00, i.e. 32940.

-- Reuti


> so what could be preventing an AR of 225 hours from handling jobs
> that run longer than 9 hours 9 minutes and 59 seconds?
>
> -----Original Message-----
> From: mhanby [mailto:mhanby at uab.edu]
> Sent: Friday, March 12, 2010 1:40 PM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] Advanced Reservation question
>
> I think I've narrowed it down to the h_rt/s_rt for the job in
> question.
>
> The duration for the AR is 225 hours.
>
> If the user requests 1.5 hours it works:
>
> $ echo `/bin/hostname` | qsub -ar 15 -pe lam_loose_rsh 32 -l
> h_rt=01:30:00
> Your job 110985 ("STDIN") has been submitted
>
> However, if he requests 15.5 hours of runtime it fails:
> $ echo `/bin/hostname` | qsub -ar 15 -pe lam_loose_rsh 32 -l
> h_rt=15:30:00
> Unable to run job: error: no suitable queues.
> Exiting.
>
> What could be blocking the usage of the nodes that are reserved for
> 225 hours?
>
> If the user submits the job without the AR request (and there are 32
> slots available) it works, so the h_rt isn't a limit set globally.
>
> Mike
>
> -----Original Message-----
> From: mhanby [mailto:mhanby at uab.edu]
> Sent: Friday, March 12, 2010 1:18 PM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] Advanced Reservation question
>
> Also, if I run qstat as the jdoe user it shows 64 reserved:
>
> [jdoe at cluster1]$ qstat -g c
> CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL
> aoACDS  cdsuE
> --------------------------------------------------------------------------------
> all.q                             0.66    128     64     64
> 192      0      0
>
> And the user can view the AR using qrstat if he specifies it by id:
>
> [jdoe at cluster1]$ qrstat -ar 15
> --------------------------------------------------------------------------------
> id                             15
> name                           testAR
> owner                          mikeh
> state                          r
> start_time                     03/12/2010 13:00:00
> end_time                       03/21/2010 23:59:00
> duration                       225:59:00
> submission_time                03/12/2010 11:36:36
> group                          sge
> account                        sge
> granted_slots_list             all.q at compute-1-4.local=8,all.q at compute-0-8.local
> =8,all.q at compute-0-7.local=8,all.q at compute-0-3.local=8,all.q at compute-0-12.local
> =8,all.q at compute-0-10.local=8,all.q at compute-0-5.local=3,all.q at compute-0-6.local
> =8,all.q at compute-0-14.local=5
> granted_parallel_environment   lam_loose_rsh slots 64
> mail_options                   abe
> acl_list                       mikeh,jdoe
>
> -----Original Message-----
> From: mhanby [mailto:mhanby at uab.edu]
> Sent: Friday, March 12, 2010 1:14 PM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] Advanced Reservation question
>
> If I run qrstat as my user, it prints out, if I run it as jdoe (the
> other user on the acl), qrstat is empty
>
> [mikeh at cluster1]$ qrstat
> ar-id   name       owner        state start at             end
> at               duration
> ------------------------------------------------------------------------------------------
>     15 testAR     mikeh       r     03/12/2010 13:00:00  03/21/2010
> 23:59:00  225:59:00
>
> [jdoe at cluster1]$ qrstat
>
> Strange
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Friday, March 12, 2010 1:12 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Advanced Reservation question
>
> Am 12.03.2010 um 19:54 schrieb mhanby:
>
>> Hmm, next question.
>>
>> If I (the creator of the AR) submit a job requesting the
>> reservation, it works fine:
>>
>> [mikeh at cluster1]$ qsub -ar 15 qsub_lam_hello.sh
>> Your job 110972 ("j_lam_hello") has been submitted
>>
>> However, if the other user on the access list for the AR tries to
>> submit a job:
>>
>> [jdoe at cluster1]$ qsub -ar 15 lammps-job.sh
>> Unable to run job: error: no suitable queues.
>> Exiting.
>
> What is the state of the AR:
>
> $ qrstat
>
> I would assume it's in state "E" instead of "r" because of the
> following bug:
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=3227
>
> -- Reuti
>
>
>> $ qrstat -ar 15 |grep acl
>> acl_list                       mikeh,jdoe
>>
>> According to the man file the user list is comma separated list of
>> UNIX users.
>>
>> Any suggestions?
>>
>> I'd prefer that the administrators create reservations for the users.
>>
>> Mike
>> -----Original Message-----
>> From: mhanby [mailto:mhanby at uab.edu]
>> Sent: Friday, March 12, 2010 11:48 AM
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] Advanced Reservation question
>>
>> Thanks Reuti, that did the trick.
>>
>> AR is pretty darn cool!
>>
>> -----Original Message-----
>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Friday, March 12, 2010 3:46 AM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Advanced Reservation question
>>
>> Hi,
>>
>> Am 12.03.2010 um 00:03 schrieb mhanby:
>>
>>> I'm just trying out reservations for the first time.
>>>
>>> GE 6.2u5
>>>
>>> I'm having trouble creating an advanced reservation for a user
>>> requesting 8 nodes with 8 slots each:
>>>
>>> $ qrsub -q
>>> '*@compute
>>> -0-1,*@compute-0-10,*@compute-0-12,*@compute-0-4,*@compute-
>>> 0
>>> -6,*@compute-0-8,*@compute-0-9,*@compute-1-8' -u jdoe,mikeh -a
>>> 201003120100 -e 201003211200 -m bea -M mhanby at uab.edu -N Brain01
>>
>> the syntax is like for a normal qsub.
>>
>> In the essence you need to request "-pe orte 64" or alike to reserve
>> more than one slot, where the allocation rule is used to reserve the
>> slots. What you then submit into this AR is unrelated, i.e. it can be
>> serial job or one with a different allocation rule, which will then
>> be used inside the granted slots *).
>>
>> -- Reuti
>>
>> *) At least it was the behavior when I checked it last. Now it seems
>> to use always the allocation_rule of the PE of the reservation,
>> although the output states:
>>
>> parallel environment:  smp range: 2
>>
>> with "allocation_rule $pe_slosts" and I get one slot on two machines.
>> IMO it's a bug. It should either reject the job (no suitable queue in
>> the AR, PE dont' match), or use the allocation_rule of the job inside
>> the granted slots of the AR (of course with the possibility that the
>> job can't be scheduled inside the reservation).
>>
>>
>>> Your advance reservation 5 has been granted
>>>
>>> These are 8 nodes that are currently running jobs, but will be open
>>> by the time the reservation starts
>>>
>>> $ qrstat -ar 5
>>> ---------------------------------------------------------------------
>>> -
>>> ----------
>>> id                             5
>>> name                           Brain01
>>> owner                          mikeh
>>> state                          W
>>> start_time                     03/12/2010 01:00:00
>>> end_time                       03/21/2010 12:00:00
>>> duration                       226:00:00
>>> submission_time                03/11/2010 16:53:14
>>> group                          sge
>>> account                        sge
>>> granted_slots_list             all.q at compute-0-8.local=1
>>> mail_options                   abe
>>> mail_list                      root at localhost
>>> acl_list                       jdoe,mikeh
>>>
>>> The output seems to indicate that the reservation is only being
>>> created for one host and a single slot, instead of 8 hosts each
>>> with 8 slots.
>>>
>>> Do I have the syntax wrong for GE6.2u5?
>>>
>>> Also, is there a way to modify an existing AR? I don't see a
>>> 'qrmod' command.
>>>
>>>
>>> =================================
>>> Mike Hanby
>>> mhanby at uab.edu
>>> Information Systems Specialist II
>>> IT HPCS / Research Computing
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=248081
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=248136
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=248191
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=248203
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248207
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
> ].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248208
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
> ].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248211
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
> ].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248218
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
> ].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248219
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
> ].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248240

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list