[GE users] scheduler crashes on applying qrs for queued job?

Reuti reuti at staff.uni-marburg.de
Mon Sep 17 17:48:36 BST 2007


Henk:

Am 17.09.2007 um 18:04 schrieb SLIM H.A.:

> The problem I want to solve is this. Part of the cluster was financied
> by a small group of users but this subcluster may also be used by
> others. The owner-users should have access on short notice and/or high
> priority compared to the other users.
> One option is to give the owner-users unlimited cpu time and restrict
> this for others. This would allow other users access for a guaranteed
> limited amount of time, when the subcluster is not used by the owners.
> I agree I could create additional queues to manage this but would like
> to keep things simple for the users and use a single queue. Our system
> is quite heterogenous wrt nodes, with 3 types of interconnect and
> various serial and parallel queues. The rqs looked like a simple
> solution (the example that crashes the scheduler is merely a simple
> test)

the resource quota isn't designed to define any hard limits on the  
jobs, but handle consumable resources. This is up to h_rt in the  
queue definition.

[Maybe combining will work: make h_rt consumable and set it to a  
really high value in the exec host definition which is "unlimited" in  
virtual. Is the scheduler still crashing?]

If I got you in the right way:

- you need one user group for your small group
- this must be in xuser_lists in a queue with h_rt=600, call it pvm_ext
- this must be in user_lists in a queue with unlimited h_rt, call it  
pvm_int
- limit slots=x for each host in @pvmhosts
- qsub to a queue with -q pvm*; as each user well end up in only one  
of the queues, it's well defined

-- Reuti


> Regards
>
> Henk
>
>>
>> Resource quotas are targeting consumable and fixed complexes
>> for now.
>> Making h_rt consumable is not really an option, but it would
>> be possible to define two queues on @pvmhosts with different
>> user_lists set. The time you set there (in the queue
>> definition) for h_rt in one of them will also be enforced.
>> Maybe you have to limit the total slot count also in the
>> exechost definition, as you habe now (at least) two queues
>> per machine.
>>
>> But anyway: crashing the scheduler is of course a bug,
>> instead there should be an error message. I suggest to file
>> an issue for it.
>>
>> -- Reuit
>>
>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: 17 September 2007 16:12
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] scheduler crashes on applying qrs for
>> queued job?
>>
>> Am 17.09.2007 um 16:33 schrieb SLIM H.A.:
>>
>>> The following problem occurred with 6.1u2 on OpenSUSE 10:
>>>
>>> I have a single resource quota set with a limit set for a
>> single user
>>> dcl0has:
>>>
>>> {
>>>    name         ham3
>>>    description  "Restrict wall clock time"
>>>    enabled      TRUE
>>>    limit        users dcl0has hosts @pvmhosts to h_rt=600
>>> }
>>>
>>> When user dcl0has submits a job the scheduler crashes. The
>> job remains
>>> in the qw state and after the scheduler has been restarted
>> it will try
>>> to schedule it but crashes again.
>>>
>>> Replacing the user by an acl, like
>>>
>>>    limit        users testproject hosts @pvmhosts to h_rt=600
>>>
>>> where qconf -su testproject gives
>>> name    testproject
>>> type    ACL
>>> fshare  0
>>> oticket 0
>>> entries dcl0has, ... and more users
>>>
>>> does work for user dcl0has although the wall clock time limit of 10
>>> minutes is not enforced. I assume h_rt can be used in rq sets?
>>>
>>> I haven't found anything relevant in the message files (yet).
>>> Is this a known problem?
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list