[GE users] Problems with PEs and resource quotas

reuti reuti at staff.uni-marburg.de
Tue Dec 14 10:18:22 GMT 2010


Hi,

Am 13.12.2010 um 21:27 schrieb mdsteeves:

> We're running SGE 6.2u4 on RHEL5.4.
> 
> We've set up Olesen to help users run jobs on the cluster that require 
> FLEXlm licenses, and would also like to be able to set up a resource 
> quota so that when users launch jobs they're not able to lock up all of 
> the licenses:
> 
> {
>    name         moe_limit
>    description  limit everyone to no more than 20 moe license
>    enabled      TRUE
>    limit        users {*} to moe=20
> }
> 
> For some reason, though, we're running into problems with some users 
> that submit jobs that use PEs, and also request certain resources with 
> the "-l" switch get stuck in a qw state, and the message references the 
> resource quota:
> 
> scheduling info:            queue instance "mpi.q at compute-1-25.local" 
> dropped because it is disabled
>                             queue instance "himem.q at compute-0-11.local" 
> dropped because it is disabled
>                             queue instance "mpi.q at compute-1-26.local" 
> dropped because it is full
>                             cannot run in queue "himem.q" because it is 
> not contained in its hard queue list (-q)
>                             cannot run because it exceeds limit 
> "steevmi1/////" in rule "moe_limit/1"
>                             cannot run in PE "orte" because it only 
> offers 0 slots
> 
> For testing, I've been using the following script:
> 
> #!/bin/bash
> 
> #$ -S /bin/ksh
> #$ -j y
> #$ -cwd
> #$ -q mpi.q
> #$ -pe orte 8
> #$ -N mdsTest
> ##  The following all work:
> ##  #$ -l h_cpu=1
> ##  #$ -l mem_total=5G
> ##  #$ -l arch=lx26-amd64
> ##  #$ -l moe=1
> ##  Any of the following do not work, and cause the job to hang in the 
> queue:
> ##  #$ -l q=mpi.q
> ##  #$ -l hostname="compute-0-2"
> ##  #$ -l 
> hostname="compute-0-78|compute-0-106|compute-0-69|compute-0-68|compute-0-100|compute-0-63|compute-0-93|compute-0-82|compute-0-76"

I don't see any resource reservation in the above lines: #$ -R

And to have an effect it's necessary to set "max_reservation 20" or an appropriate value in the scheduler configuration. Then slots should be reserved for this job, so that he won't die of starvation.

Is this fixing the issue?

-- Reuti


> hostname
> sleep 300
> 
> Even switching from "-q mpi.q" to "-masterq mpi.q" doesn't help any. If 
> we disable the resource quota rule, then the jobs run without any 
> problems. Is there something that we're missing?
> 
> 
> -Mike
> -- 
> Michael Steeves (mdsteeves at gmail.com)
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=305177
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=305386

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list