[GE users] Cannot run because resources requested are not available

reuti reuti at staff.uni-marburg.de
Wed Aug 18 11:09:31 BST 2010


Am 18.08.2010 um 10:03 schrieb spow_:

> Sorry for the delayed answer, the access to the problematic cluster is very restricted.
> > Date: Mon, 16 Aug 2010 16:43:05 +0200
> > From: reuti at staff.uni-marburg.de
> > To: users at gridengine.sunsource.net
> > Subject: Re: [GE users] Cannot run because resources requested are not available
> > 
> > Hi,
> > 
> > Am 16.08.2010 um 16:32 schrieb spow_:
> > 
> > > I am requesting two instances because i read than mem_free may result in oversubscription if used alone.
> > 
> > not when you make it (mem_free) consumable and attach a feasible value in the exechost definition. When the measured value is lower than SGE's internal bookkeeping of this complex, this will be used.
> >  
> > > The mem_token is supposed to reserve the amount of RAM at sumbission, whereas mem_free does not guarantee it (from what I have understood).
> > > 
> > > I found this tip in a discussion in which you participated. You had a preference for using h_vmem, but it kills the jobs that are wrongly defined, so I'd rather use token+free for now.
> > 
> > http://gridengine.info/2009/12/01/adding-memory-requirement-awareness-to-the-scheduler
> > 
> > Just use what fits better to your needs.
> I'll be looking into this if the current method keeps failing.
> > > As for the parallel jobs, they run fine if no resources requests are made.
> > > The number of slots defined in the PE is equal to 2 times the PE 'size' : i.e. MPI-4 that is used by a queue spanning from host 1 to host 4 has 8 slots (because hosts are dual-core).
> > 
> > What allocation_rule and what did you request in `qsub`?
> My AR is $round_robin.
> The qsub looks like this : qsub -hard -l mem_token=1G -l mem_free=1G -pe "mpi*" 8 <jobname>
> I have then removed mem_free as a consumable in the exechost (and left mem_token and slots in the exechost consumable/fixed attributes).

so it's only a load value any longer?

> If I now submit qsub -hard -l mem_free=1G -pe "mpi*" 8 <jobname>    (i.e. no mem_token request) it does work.

Then you might indeed consume more than available, as it's only a snapshot of the acutal usage of memory, which may vary over time. When "mem_free" is made consumable, the lower of a) the measured free memory or b) the computed consumable complex will be taken into account.

> The problem seems to come from the consumable definition in the exechost, but I do not have this problem on the test cluster. (where only mem_token and slots are defined, so it's basically the same configuration)

The complex definition is the same, and also the RQSs (if any are defined) are the same?

-- Reuti


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list