[GE users] Problems with PEs and resource quotas

reuti reuti at staff.uni-marburg.de
Tue Dec 14 20:34:14 GMT 2010

Am 14.12.2010 um 21:02 schrieb mdsteeves:

> On 12/14/10 5:18 AM, reuti wrote:
> [SNIP]
>> I don't see any resource reservation in the above lines: #$ -R
>> And to have an effect it's necessary to set "max_reservation 20" or an appropriate value in the scheduler configuration. Then slots should be reserved for this job, so that he won't die of starvation.
>> Is this fixing the issue?
> Resource reservation for the resource quota piece? We don't use that at 
> the moment -- the moe_limit that's currently in place limits each user 
> to only be able to have 20 jobs running, which is the behavior that we 
> want. The problem we're having is that other jobs, that don't need or 
> use these licenses, get stuck in a "qw" state, and reference the 
> moe_limit resource quota. If we go in and disable the resource quota, 
> then the job gets dispatched to a node and runs without problem.

AFAICS you are limiting the number of potential queue instances with all the examples you mentioned as not working:

## #$ -l q=mpi.q
## #$ -l hostname="compute-0-2"
## #$ -l hostname...

Hence SGE has less options to schedule the job. Or does it also happen in an empty cluster?

Nevertheless: One bug to mention is, that you can't use -q in combination with -l h=. The workaround is to request the hostnames in the -q request:

-q mpi.q at compute-0-2

-- Reuti

> If we don't use either "-l qname=...." or "-l hostname=...." when we 
> submit the job, then it launches without problem.
> If we don't specify a parallel environment, but leave the -l requests in 
> the job submission, then it launches without a problem.
> While I haven't tested each and every resource that could be requested 
> when a job is submitted, the jobs only seem to stick in a qw state if we 
> try to request either a queue or a host.
-Mike
