[GE users] Resource Quotas causing problems for a single user

reuti reuti at staff.uni-marburg.de
Sun Sep 6 20:23:48 BST 2009


Am 05.09.2009 um 03:03 schrieb m0zes:

> Hello everyone,
> I seem to be having some issues with a recent change to my cluster
> setup. Previously, the cluster had one queue (batch.q), with a time
> restriction setup for @somenodes. I have now modified the setup to
> include batch.q, long.q, highmem.q, and long-highmem.q. I have been
> attempting to restrict things across queues using resource quotas. I
> made this complex change last Monday, and it worked until 4:00 today.
> The resource quotas that I have been using are:
> {
>    name         max_slots_per_user
>    description  "Set the maximum number of slots a user can utilize  
> at once"
>    enabled      TRUE
>    limit        users {*} to slots=700
> }
> {
>    name         max_slots_per_host
>    description  NONE
>    enabled      TRUE
>    limit        hosts {@titans} to slots=16
>    limit        hosts {@brutes-small} to slots=4
>    limit        hosts {@brutes-large} to slots=8
>    limit        hosts {@scouts} to slots=8
>    limit        hosts {@rogues} to slots=8
>    limit        hosts {@fiends} to slots=4
> }
> {
>    name         max_slots_per_queue
>    description  NONE
>    enabled      TRUE
>    limit        queues batch.q to slots=1000
>    limit        queues test.q to slots=1000
>    limit        queues special.q to slots=1000
>    limit        queues long-highmem.q to slots=600
>    limit        queues highmem.q to slots=350
>    limit        queues long.q to slots=250
> }
> {
>    name         max_mem_per_host
>    description  NONE
>    enabled      TRUE
>    limit        hosts {@titans} to memory=64G
>    limit        hosts {@brutes-small} to memory=16G
>    limit        hosts {@brutes-large} to memory=32G
>    limit        hosts {@scouts} to memory=8G
>    limit        hosts {@rogues} to memory=8G
>    limit        hosts {@fiends} to memory=8G

is "memory" a custom requestable and/or consumable you defined - why  
didn't you stick with h_vmem or virtual_free?

What was the submit command you ran for the new job?

> }
> Now when user1 submits a job, the job won't get executed. qstat -j
> $jobnum gives this output:
> cannot run because it exceeds limit "user1/////" in rule  
> "max_slots_per_user/1"
> cannot run because it exceeds limit "user1/////" in rule  
> "max_slots_per_user/1"
> cannot run in PE "single" because it only offers 0 slots
> This is impossible, as qquota -u \* shows that user1 is not using any
> of his slot quota
> resource quota rule limit                filter
> ---------------------------------------------------------------------- 
> ----------
> max_slots_per_user/1 slots=4/700          users user2
> max_slots_per_user/1 slots=58/700         users user3
> max_slots_per_user/1 slots=16/700         users user4
> max_slots_per_host/1 slots=2/16           hosts titan5
> max_slots_per_host/1 slots=1/16           hosts titan8
> max_slots_per_host/4 slots=3/8            hosts scout62
> max_slots_per_host/4 slots=8/8            hosts scout74
> max_slots_per_host/4 slots=8/8            hosts scout78
> max_slots_per_host/4 slots=6/8            hosts scout70
> max_slots_per_host/4 slots=4/8            hosts scout65
> max_slots_per_host/4 slots=8/8            hosts scout69
> max_slots_per_host/4 slots=8/8            hosts scout63
> max_slots_per_host/4 slots=8/8            hosts scout77
> max_slots_per_host/4 slots=6/8            hosts scout55
> max_slots_per_host/4 slots=8/8            hosts scout73
> max_slots_per_host/4 slots=8/8            hosts scout76
> max_slots_per_queue/1 slots=78/1000        queues batch.q
> max_mem_per_host/1 memory=12.000G/64.00 hosts titan5
> max_mem_per_host/1 memory=6.000G/64.000 hosts titan8
> max_mem_per_host/4 memory=8.000G/8.000G hosts scout62
> max_mem_per_host/4 memory=8.000G/8.000G hosts scout74
> max_mem_per_host/4 memory=2.000G/8.000G hosts scout78
> max_mem_per_host/4 memory=6.000G/8.000G hosts scout70
> max_mem_per_host/4 memory=4.000G/8.000G hosts scout65
> max_mem_per_host/4 memory=8.000G/8.000G hosts scout69
> max_mem_per_host/4 memory=8.000G/8.000G hosts scout63
> max_mem_per_host/4 memory=8.000G/8.000G hosts scout77
> max_mem_per_host/4 memory=6.000G/8.000G hosts scout55
> max_mem_per_host/4 memory=8.000G/8.000G hosts scout73
> max_mem_per_host/4 memory=2.000G/8.000G hosts scout76
> The next line of the qstat -j output is odd to me, too:
> cannot run in PE "single" because it only offers 0 slots

It says 0, when the available slots it sees as free are exhausted.  
Defining the PE having an arbitrary high slot count and limiting the  
real usage by queue/host/RQS is a valid setup.

-- Reuti

> Again, this shouldn't happen as the PE "single" contains 10000 slots,
> (10 times the number of cores in the cluster).
> I have tried restarting qmaster, it didn't seem to have any effect. I
> cannot restart the execd services on the nodes at the moment, as some
> of them are still loaded.
> Anyone have any thoughts about this lengthy/complex setup?
> --
> Adam
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=215858
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list