[GE users] scheduler crashes on applying qrs for queued job?

Reuti reuti at staff.uni-marburg.de
Mon Sep 17 16:12:26 BST 2007


Am 17.09.2007 um 16:33 schrieb SLIM H.A.:

> The following problem occurred with 6.1u2 on OpenSUSE 10:
>
> I have a single resource quota set with a limit set for a single user
> dcl0has:
>
> {
>    name         ham3
>    description  "Restrict wall clock time"
>    enabled      TRUE
>    limit        users dcl0has hosts @pvmhosts to h_rt=600
> }
>
> When user dcl0has submits a job the scheduler crashes. The job remains
> in the qw state and after the scheduler has been restarted it will try
> to schedule it but crashes again.
>
> Replacing the user by an acl, like
>
>    limit        users testproject hosts @pvmhosts to h_rt=600
>
> where qconf -su testproject gives
> name    testproject
> type    ACL
> fshare  0
> oticket 0
> entries dcl0has, ... and more users
>
> does work for user dcl0has although the wall clock time limit of 10
> minutes is not enforced. I assume h_rt can be used in rq sets?
>
> I haven't found anything relevant in the message files (yet).
> Is this a known problem?

Resource quotas are targeting consumable and fixed complexes for now.  
Making h_rt consumable is not really an option, but it would be  
possible to define two queues on @pvmhosts with different user_lists  
set. The time you set there (in the queue definition) for h_rt in one  
of them will also be enforced. Maybe you have to limit the total slot  
count also in the exechost definition, as you habe now (at least) two  
queues per machine.

But anyway: crashing the scheduler is of course a bug, instead there  
should be an error message. I suggest to file an issue for it.

-- Reuit

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list