[GE users] Cannot run because resources requested are not available

spow_ miomax_ at hotmail.com
Wed Aug 18 09:03:51 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Sorry for the delayed answer, the access to the problematic cluster is very restricted.

> Date: Mon, 16 Aug 2010 16:43:05 +0200
> From: reuti at staff.uni-marburg.de
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Cannot run because resources requested are not available
>
> Hi,
>
> Am 16.08.2010 um 16:32 schrieb spow_:
>
> > I am requesting two instances because i read than mem_free may result in oversubscription if used alone.
>
> not when you make it (mem_free) consumable and attach a feasible value in the exechost definition. When the measured value is lower than SGE's internal bookkeeping of this complex, this will be used.
>
> > The mem_token is supposed to reserve the amount of RAM at sumbission, whereas mem_free does not guarantee it (from what I have understood).
> >
> > I found this tip in a discussion in which you participated. You had a preference for using h_vmem, but it kills the jobs that are wrongly defined, so I'd rather use token+free for now.
>
> http://gridengine.info/2009/12/01/adding-memory-requirement-awareness-to-the-scheduler
>
> Just use what fits better to your needs.

I'll be looking into this if the current method keeps failing.

> > As for the parallel jobs, they run fine if no resources requests are made.
> > The number of slots defined in the PE is equal to 2 times the PE 'size' : i.e. MPI-4 that is used by a queue spanning from host 1 to host 4 has 8 slots (because hosts are dual-core).
>
> What allocation_rule and what did you request in `qsub`?

My AR is $round_robin.
The qsub looks like this : qsub -hard -l mem_token=1G -l mem_free=1G -pe "mpi*" 8 <jobname>

I have then removed mem_free as a consumable in the exechost (and left mem_token and slots in the exechost consumable/fixed attributes).
If I now submit qsub -hard -l mem_free=1G -pe "mpi*" 8 <jobname>    (i.e. no mem_token request) it does work.
The problem seems to come from the consumable definition in the exechost, but I do not have this problem on the test cluster. (where only mem_token and slots are defined, so it's basically the same configuration)

<snip>
> As asked: does a parallel job run w/o and resource request?

Parallel jobs run just fine if no resources requests are made at all.

GQ



More information about the gridengine-users mailing list