[GE users] scheduler crashes on applying qrs for queued job?

Reuti reuti at staff.uni-marburg.de
Mon Sep 17 19:49:29 BST 2007


Am 17.09.2007 um 18:47 schrieb Andreas.Haas at Sun.COM:

> On Mon, 17 Sep 2007, Reuti wrote:
>
>> Am 17.09.2007 um 16:33 schrieb SLIM H.A.:
>>
>>> The following problem occurred with 6.1u2 on OpenSUSE 10:
>>> I have a single resource quota set with a limit set for a single  
>>> user
>>> dcl0has:
>>> {
>>>   name         ham3
>>>   description  "Restrict wall clock time"
>>>   enabled      TRUE
>>>   limit        users dcl0has hosts @pvmhosts to h_rt=600
>>> }
>>> When user dcl0has submits a job the scheduler crashes. The job  
>>> remains
>>> in the qw state and after the scheduler has been restarted it  
>>> will try
>>> to schedule it but crashes again.
>>> Replacing the user by an acl, like
>>>
>>>   limit        users testproject hosts @pvmhosts to h_rt=600
>>> where qconf -su testproject gives
>>> name    testproject
>>> type    ACL
>>> fshare  0
>>> oticket 0
>>> entries dcl0has, ... and more users
>>> does work for user dcl0has although the wall clock time limit of 10
>>> minutes is not enforced. I assume h_rt can be used in rq sets?
>>> I haven't found anything relevant in the message files (yet).
>>> Is this a known problem?
>>
>> Resource quotas are targeting consumable and fixed complexes for  
>> now. Making h_rt consumable is not really an option, but it would  
>> be possible to define two queues on @pvmhosts with different  
>> user_lists set. The time you set there (in the queue definition)  
>> for h_rt in one of them will also be enforced. Maybe you have to  
>> limit the total slot count also in the exechost definition, as you  
>> habe now (at least) two queues per machine.
>
> I see no indication Henk actually made h_rt a consumable, so the  
> case should work.

This I missed here :-/ After arriving home and rethinking about it:  
wouldn't this mean to have a limit per job what he requests?

>>> limit        users testproject hosts @pvmhosts to h_rt=600

All users in testproject using @pvmhosts may request in total  
h_rt=600, or as {testproject} each user in this userlist. But a limit  
per job isn't implemented up to now - or did I miss it? I filled:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2147

and (http://gridengine.sunsource.net/issues/show_bug.cgi?id=2148)  
some time ago, which cover this.

-- Reuti


>> But anyway: crashing the scheduler is of course a bug, instead  
>> there should be an error message. I suggest to file an issue for it.
>
> Yes please, Henk, file one.
>
> Thanks,
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list