[GE users] Per job limit of PE slots

Hristo Iliev hristo at phys.uni-sofia.bg
Thu Jun 21 16:27:34 BST 2007


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On ??, 2007-06-21 at 15:39 +0200, Reuti wrote:
> Am 21.06.2007 um 14:58 schrieb Hristo Iliev:
> 
> > We have a 4 node Linux cluster with 8-core nodes that runs a mix of
> > serial, OpenMP and MPI jobs. 3 nodes are used for batch processing  
> > only
> > and one node is reserved for interactive jobs. Grid Engine 6.1 is  
> > used.
> > I would like to limit each parallel batch job to certain number of  
> > slots
> > depending on the value of h_rt requested:
> > - short test/benchmark jobs (up to 1 hour) that can eat up to 24 slots
> > each
> > - medium length jobs (up to 1 day) that can eat up to 16 slots each
> > - long running jobs (up to 1 week) that can eat up to 8 slots each
> > I have set up three cluster queues (with different h_rt limit)  
> > spanning
> > all 3 batch nodes and set 'slots=8' in exechosts definitions to  
> > prevent
> > oversubscription. The OpenMP PE uses $pe_slots allocation policy so it
> > is automagically limited to 8 slots but I have some very hard time
> > trying to convince SGE not to let long and medium MPI use more slots
> > than the policy defines. I can successfully limit user slots in each
> > queue with resource quotas but I cannot do it on a per job basis.  
> > What I
> > would like to achieve is to allow two (or even three) long jobs  
> > from the
> > same user running in parallel on 8 cores each but to deny one long job
> > taking 7 or more cores.
> 
> Do you mean "...9 or more cores..."?
>

Hi, Reuti,

That's right - 9 or more cores/slots. I'm a little oversubscribed right
now and am making lots of mistakes :)

> This is not possible for now. But there is already an RFE to cover  
> some of this aspects:
> 
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2147
> 
> and
> 
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2148
> 

I see. Thank you for pointing it out. Is there some kind of a workaround
for 6.1? I am seeing numerous WEB pages from other research facilities
that are also using Sun Grid Engine and have somehow limited the number
of slots per task. Or should I consider using suboridate queues with
some share policy and see if our parallel tasks can survive a
SIGSTOP/CONT preemption.

Sorry for bothering you but I'm really new to the administration of SGE.
Life was far more simple when I was only a user :)

Hristo

> -- Reuti
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


    [ Part 2, "This is a digitally signed message part" ]
    [ Application/PGP-SIGNATURE (Name: "signature.asc") 196 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list