[GE users] Advanced reservation for cluster outage?

s_kreidl sabine.kreidl at uibk.ac.at
Tue Aug 11 10:01:51 BST 2009


I tried to do an advanced reservation to elegantly overcome an approaching cluster outage, but failed - if this is in general not the right approach for such a situation, please let me know how this is usually done. My major concern is to allow "backfilling" with jobs, which have an h_rt limit that would allow them to finish before the outage. 

We have SGE 6.2u2_1 installed.
We have two queues, all.q and par.q, both with imposed h_rt runtime limits (identical with the scheduler's default_duration).

I managed to reserve the majority of slots on the cluster with the following command line:
qrsub -a 200908161700.00 -e 200908171700.00 -u test_user -q "*" -pe "openmpi*" 770-

The resulting AR:
# qrstat -ar 92
id                             92
name                           NetApp
owner                          root
state                          w
start_time                     08/16/2009 17:00:00
end_time                       08/17/2009 17:00:00
duration                       24:00:00
submission_time                08/10/2009 14:45:15
group                          sge
account                        sge
granted_slots_list   par.q at n001=8,par.q at n003=8,...
granted_parallel_environment   openmpi* slots 770-9999999
acl_list                       test_user

There are two things going wrong with respect to what I'm trying to do:

1. I can still submit all.q jobs with runtime limits too long to the reserved nodes. So, how do I reserve the whole cluster, rather than a queue, preferably within one single command line?

2. Jobs submitted to the par.q don't start, even if their runtime limit is well below the critical limit (I tried with -l h_rt=60). # qstat -g c" shows:
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
all.q                             0.51     86      0    418    504      0      0 
par.q                             1.00    624      0   -616      8      0      0


Thanks in advance for your help.
Best,
Sabine

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=211798

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list