[GE users] Advanced Reservation question

mhanby mhanby at uab.edu
Fri Mar 12 19:39:46 GMT 2010


I think I've narrowed it down to the h_rt/s_rt for the job in question.

The duration for the AR is 225 hours.

If the user requests 1.5 hours it works:

$ echo `/bin/hostname` | qsub -ar 15 -pe lam_loose_rsh 32 -l h_rt=01:30:00
Your job 110985 ("STDIN") has been submitted

However, if he requests 15.5 hours of runtime it fails:
$ echo `/bin/hostname` | qsub -ar 15 -pe lam_loose_rsh 32 -l h_rt=15:30:00
Unable to run job: error: no suitable queues.
Exiting.

What could be blocking the usage of the nodes that are reserved for 225 hours?

If the user submits the job without the AR request (and there are 32 slots available) it works, so the h_rt isn't a limit set globally.

Mike

-----Original Message-----
From: mhanby [mailto:mhanby at uab.edu] 
Sent: Friday, March 12, 2010 1:18 PM
To: users at gridengine.sunsource.net
Subject: RE: [GE users] Advanced Reservation question

Also, if I run qstat as the jdoe user it shows 64 reserved:

[jdoe at cluster1]$ qstat -g c
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q                             0.66    128     64     64    192      0      0

And the user can view the AR using qrstat if he specifies it by id:

[jdoe at cluster1]$ qrstat -ar 15
--------------------------------------------------------------------------------
id                             15
name                           testAR
owner                          mikeh
state                          r
start_time                     03/12/2010 13:00:00
end_time                       03/21/2010 23:59:00
duration                       225:59:00
submission_time                03/12/2010 11:36:36
group                          sge
account                        sge
granted_slots_list             all.q at compute-1-4.local=8,all.q at compute-0-8.local=8,all.q at compute-0-7.local=8,all.q at compute-0-3.local=8,all.q at compute-0-12.local=8,all.q at compute-0-10.local=8,all.q at compute-0-5.local=3,all.q at compute-0-6.local=8,all.q at compute-0-14.local=5
granted_parallel_environment   lam_loose_rsh slots 64
mail_options                   abe
acl_list                       mikeh,jdoe

-----Original Message-----
From: mhanby [mailto:mhanby at uab.edu] 
Sent: Friday, March 12, 2010 1:14 PM
To: users at gridengine.sunsource.net
Subject: RE: [GE users] Advanced Reservation question

If I run qrstat as my user, it prints out, if I run it as jdoe (the other user on the acl), qrstat is empty

[mikeh at cluster1]$ qrstat
ar-id   name       owner        state start at             end at               duration
------------------------------------------------------------------------------------------
     15 testAR     mikeh       r     03/12/2010 13:00:00  03/21/2010 23:59:00  225:59:00

[jdoe at cluster1]$ qrstat

Strange

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Friday, March 12, 2010 1:12 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Advanced Reservation question

Am 12.03.2010 um 19:54 schrieb mhanby:

> Hmm, next question.
>
> If I (the creator of the AR) submit a job requesting the  
> reservation, it works fine:
>
> [mikeh at cluster1]$ qsub -ar 15 qsub_lam_hello.sh
> Your job 110972 ("j_lam_hello") has been submitted
>
> However, if the other user on the access list for the AR tries to  
> submit a job:
>
> [jdoe at cluster1]$ qsub -ar 15 lammps-job.sh
> Unable to run job: error: no suitable queues.
> Exiting.

What is the state of the AR:

$ qrstat

I would assume it's in state "E" instead of "r" because of the  
following bug:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=3227

-- Reuti


> $ qrstat -ar 15 |grep acl
> acl_list                       mikeh,jdoe
>
> According to the man file the user list is comma separated list of  
> UNIX users.
>
> Any suggestions?
>
> I'd prefer that the administrators create reservations for the users.
>
> Mike
> -----Original Message-----
> From: mhanby [mailto:mhanby at uab.edu]
> Sent: Friday, March 12, 2010 11:48 AM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] Advanced Reservation question
>
> Thanks Reuti, that did the trick.
>
> AR is pretty darn cool!
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Friday, March 12, 2010 3:46 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Advanced Reservation question
>
> Hi,
>
> Am 12.03.2010 um 00:03 schrieb mhanby:
>
>> I'm just trying out reservations for the first time.
>>
>> GE 6.2u5
>>
>> I'm having trouble creating an advanced reservation for a user
>> requesting 8 nodes with 8 slots each:
>>
>> $ qrsub -q
>> '*@compute-0-1,*@compute-0-10,*@compute-0-12,*@compute-0-4,*@compute- 
>> 0
>> -6,*@compute-0-8,*@compute-0-9,*@compute-1-8' -u jdoe,mikeh -a
>> 201003120100 -e 201003211200 -m bea -M mhanby at uab.edu -N Brain01
>
> the syntax is like for a normal qsub.
>
> In the essence you need to request "-pe orte 64" or alike to reserve
> more than one slot, where the allocation rule is used to reserve the
> slots. What you then submit into this AR is unrelated, i.e. it can be
> serial job or one with a different allocation rule, which will then
> be used inside the granted slots *).
>
> -- Reuti
>
> *) At least it was the behavior when I checked it last. Now it seems
> to use always the allocation_rule of the PE of the reservation,
> although the output states:
>
> parallel environment:  smp range: 2
>
> with "allocation_rule $pe_slosts" and I get one slot on two machines.
> IMO it's a bug. It should either reject the job (no suitable queue in
> the AR, PE dont' match), or use the allocation_rule of the job inside
> the granted slots of the AR (of course with the possibility that the
> job can't be scheduled inside the reservation).
>
>
>> Your advance reservation 5 has been granted
>>
>> These are 8 nodes that are currently running jobs, but will be open
>> by the time the reservation starts
>>
>> $ qrstat -ar 5
>> --------------------------------------------------------------------- 
>> -
>> ----------
>> id                             5
>> name                           Brain01
>> owner                          mikeh
>> state                          W
>> start_time                     03/12/2010 01:00:00
>> end_time                       03/21/2010 12:00:00
>> duration                       226:00:00
>> submission_time                03/11/2010 16:53:14
>> group                          sge
>> account                        sge
>> granted_slots_list             all.q at compute-0-8.local=1
>> mail_options                   abe
>> mail_list                      root at localhost
>> acl_list                       jdoe,mikeh
>>
>> The output seems to indicate that the reservation is only being
>> created for one host and a single slot, instead of 8 hosts each
>> with 8 slots.
>>
>> Do I have the syntax wrong for GE6.2u5?
>>
>> Also, is there a way to modify an existing AR? I don't see a
>> 'qrmod' command.
>>
>>
>> =================================
>> Mike Hanby
>> mhanby at uab.edu
>> Information Systems Specialist II
>> IT HPCS / Research Computing
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=248081
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=248136
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=248191
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=248203
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248207

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248208

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248211

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248218

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list