[GE users] Cannot run because resources requested are not available

spow_ miomax_ at hotmail.com
Mon Aug 16 15:32:01 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti,

I am requesting two instances because i read than mem_free may result in oversubscription if used alone. The mem_token is supposed to reserve the amount of RAM at sumbission, whereas mem_free does not guarantee it (from what I have understood).

I found this tip in a discussion in which you participated. You had a preference for using h_vmem, but it kills the jobs that are wrongly defined, so I'd rather use token+free for now.

As for the parallel jobs, they run fine if no resources requests are made.
The number of slots defined in the PE is equal to 2 times the PE 'size' : i.e. MPI-4 that is used by a queue spanning from host 1 to host 4 has 8 slots (because hosts are dual-core).
I have assumed this is correct because it's a part of an older configuration I was asked not to modify.

> Date: Mon, 16 Aug 2010 16:18:08 +0200
> From: reuti at staff.uni-marburg.de
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Cannot run because resources requested are not available
>
> Hi,
>
> Am 16.08.2010 um 16:12 schrieb spow_:
>
> > I'm trying to copy the configuration I have built on my test server (2 hosts) to the real one (20+ hosts).
> > The versions are slightly different, which may cause the error : I am trying to copy my configuration on a N1 6.0u2, and did my tests on a slightly newer version (6.1 I think).
> > In the end, the new 6.2u6 should be installed on the cluster, but I'll probably be gone before this happens.
> >
> > When trying to run a parallel job which requests hard resources (e.g. qsub -hard -l mem_free=1G -l mem_token=1G
>
> why are you requesting two instances for the memory?
>
>
> > -pe "mpi*" 4 jobname) on the 6.0u2 managing the 20+ hosts cluster, I have the following statements when clicking the 'Why' button :
> >
> > Cannot run because resources requested are not available for parallel job.
> > Cannot run because available slots combined under PE "name_of_PE" are not in range of job.
>
> - PE attached to a queue - does a job w/o mem_free/token run?
> - Number of slots in the PE definition reflects the number of cores in the cluster?
>
> -- Reuti
>
>
> > The only difference I can think of between the 2 configurations is that the one I used to work on has a tight integration whereas the one i'm currently working on doesn't (i.e. control slaves = true, job is first task = true).
> > I have defined mem_token and mem_free as consumables, and added them in the host configuration.
> > I have been carefully reviewing hosts definitions, queue configurations, complexes definitions ... and I cannot think of any mechanism that would block the jobs that way. I also tried different variations of the qsub (removing -hard or part of the arguments, setting mem_token=50M ... ), but it doesn't work.
> >
> > Thanks for having read,
> > have a nice day,
> > GQ
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=274720
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list