[GE users] Resource Quotas causing problems for a single user

m0zes adam.tygart at gmail.com
Sun Sep 6 21:20:40 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Sorry to have wasted your time, I just figured out my problem.

It seems that I didn't have the complex resource mem defined within
the queue for all of the hosts in the queue. After I added that the
jobs have begun to execute.

Thanks for your time,

Adam

On Sun, Sep 6, 2009 at 14:23, reuti<reuti at staff.uni-marburg.de> wrote:
> Hi,
>
> Am 05.09.2009 um 03:03 schrieb m0zes:
>
>> Hello everyone,
>>
>> I seem to be having some issues with a recent change to my cluster
>> setup. Previously, the cluster had one queue (batch.q), with a time
>> restriction setup for @somenodes. I have now modified the setup to
>> include batch.q, long.q, highmem.q, and long-highmem.q. I have been
>> attempting to restrict things across queues using resource quotas. I
>> made this complex change last Monday, and it worked until 4:00 today.
>> The resource quotas that I have been using are:
>>
>> {
>>    name         max_slots_per_user
>>    description  "Set the maximum number of slots a user can utilize
>> at once"
>>    enabled      TRUE
>>    limit        users {*} to slots=700
>> }
>> {
>>    name         max_slots_per_host
>>    description  NONE
>>    enabled      TRUE
>>    limit        hosts {@titans} to slots=16
>>    limit        hosts {@brutes-small} to slots=4
>>    limit        hosts {@brutes-large} to slots=8
>>    limit        hosts {@scouts} to slots=8
>>    limit        hosts {@rogues} to slots=8
>>    limit        hosts {@fiends} to slots=4
>> }
>> {
>>    name         max_slots_per_queue
>>    description  NONE
>>    enabled      TRUE
>>    limit        queues batch.q to slots=1000
>>    limit        queues test.q to slots=1000
>>    limit        queues special.q to slots=1000
>>    limit        queues long-highmem.q to slots=600
>>    limit        queues highmem.q to slots=350
>>    limit        queues long.q to slots=250
>> }
>> {
>>    name         max_mem_per_host
>>    description  NONE
>>    enabled      TRUE
>>    limit        hosts {@titans} to memory=64G
>>    limit        hosts {@brutes-small} to memory=16G
>>    limit        hosts {@brutes-large} to memory=32G
>>    limit        hosts {@scouts} to memory=8G
>>    limit        hosts {@rogues} to memory=8G
>>    limit        hosts {@fiends} to memory=8G
>
> is "memory" a custom requestable and/or consumable you defined - why
> didn't you stick with h_vmem or virtual_free?
>
> What was the submit command you ran for the new job?
>
>> }
>>
>> Now when user1 submits a job, the job won't get executed. qstat -j
>> $jobnum gives this output:
>>
>> cannot run because it exceeds limit "user1/////" in rule
>> "max_slots_per_user/1"
>> cannot run because it exceeds limit "user1/////" in rule
>> "max_slots_per_user/1"
>> cannot run in PE "single" because it only offers 0 slots
>>
>> This is impossible, as qquota -u \* shows that user1 is not using any
>> of his slot quota
>> resource quota rule limit                filter
>> ----------------------------------------------------------------------
>> ----------
>> max_slots_per_user/1 slots=4/700          users user2
>> max_slots_per_user/1 slots=58/700         users user3
>> max_slots_per_user/1 slots=16/700         users user4
>> max_slots_per_host/1 slots=2/16           hosts titan5
>> max_slots_per_host/1 slots=1/16           hosts titan8
>> max_slots_per_host/4 slots=3/8            hosts scout62
>> max_slots_per_host/4 slots=8/8            hosts scout74
>> max_slots_per_host/4 slots=8/8            hosts scout78
>> max_slots_per_host/4 slots=6/8            hosts scout70
>> max_slots_per_host/4 slots=4/8            hosts scout65
>> max_slots_per_host/4 slots=8/8            hosts scout69
>> max_slots_per_host/4 slots=8/8            hosts scout63
>> max_slots_per_host/4 slots=8/8            hosts scout77
>> max_slots_per_host/4 slots=6/8            hosts scout55
>> max_slots_per_host/4 slots=8/8            hosts scout73
>> max_slots_per_host/4 slots=8/8            hosts scout76
>> max_slots_per_queue/1 slots=78/1000        queues batch.q
>> max_mem_per_host/1 memory=12.000G/64.00 hosts titan5
>> max_mem_per_host/1 memory=6.000G/64.000 hosts titan8
>> max_mem_per_host/4 memory=8.000G/8.000G hosts scout62
>> max_mem_per_host/4 memory=8.000G/8.000G hosts scout74
>> max_mem_per_host/4 memory=2.000G/8.000G hosts scout78
>> max_mem_per_host/4 memory=6.000G/8.000G hosts scout70
>> max_mem_per_host/4 memory=4.000G/8.000G hosts scout65
>> max_mem_per_host/4 memory=8.000G/8.000G hosts scout69
>> max_mem_per_host/4 memory=8.000G/8.000G hosts scout63
>> max_mem_per_host/4 memory=8.000G/8.000G hosts scout77
>> max_mem_per_host/4 memory=6.000G/8.000G hosts scout55
>> max_mem_per_host/4 memory=8.000G/8.000G hosts scout73
>> max_mem_per_host/4 memory=2.000G/8.000G hosts scout76
>>
>> The next line of the qstat -j output is odd to me, too:
>> cannot run in PE "single" because it only offers 0 slots
>
> It says 0, when the available slots it sees as free are exhausted.
> Defining the PE having an arbitrary high slot count and limiting the
> real usage by queue/host/RQS is a valid setup.
>
> -- Reuti
>
>> Again, this shouldn't happen as the PE "single" contains 10000 slots,
>> (10 times the number of cores in the cluster).
>>
>> I have tried restarting qmaster, it didn't seem to have any effect. I
>> cannot restart the execd services on the nodes at the moment, as some
>> of them are still loaded.
>>
>> Anyone have any thoughts about this lengthy/complex setup?
>>
>> --
>> Adam
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=215858
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216121
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216126

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list