[GE users] why won't scheduler make a reservation?

lacroute lacroute at stanford.edu
Thu Apr 8 23:09:39 BST 2010


I'm having some difficulty getting reservations to work consistently and would appreciate some advice.  I have been able to make reservations for some jobs, but for some reason it doesn't always work if I request all of the resources available on a node (all 8 slots and most of the memory).

On our cluster the configuration for each execution host specifies 22GB of memory and 8 slots:

$ qconf -se node1
...
complex_values        slots=8,h_vmem=22G

I've set h_vmem to be a consumable resource:

$ qconf -sc
...
h_vmem              h_vmem     MEMORY      <=    YES         YES        2G       0
slots               s          INT         <=    YES         YES        1        1000

Reservations and logging have been enabled:

$ qconf -ssconf
...
params                            MONITOR=1
max_reservation                   20

I have created a special project with extra tickets to make some jobs high priority.  There is also a parallel environment for shared-memory applications called "shm" with allocation_rule $pe_slots.

When I submit a job like this I see a reservation created:

$ qsub -w e -P special -pe shm 7 -l h_vmem=3G -R y ./work.sh

Excerpt from the schedule log:
97246:1:RESERVING:1270784288:21660:P:shm:slots:7.000000
97246:1:RESERVING:1270784288:21660:H:scg1-1-10.local:slots:7.000000
97246:1:RESERVING:1270784288:21660:H:scg1-1-10.local:h_vmem:22548578304.000000

When I submit a job that requires 8 slots and only 2G per slot, sometimes no reservation is created but sometimes it is:

$ qsub -w e -P special -pe shm 8 -l h_vmem=2G -R y ./work.sh

If I run this job when the cluster is nearly idle it runs just fine.  I can't find any information about what would prevent the reservation (and I'm definitely not hitting the limit of 20 active reservations).  I'm especially baffled that it apparently works sometimes so it must depend on the state of the cluster.  Any ideas?

Thanks,
Phil

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=252754

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list