[GE users] Equivalent to PBS

Jesse Becker jbecker at northwestern.edu
Tue Sep 14 18:33:44 BST 2004

On Tue, Sep 07, 2004 at 05:47:27PM +0200, Reuti wrote:
> >the best suggestion I found after a search was to create two
> >similar PEs. one with $round_robin and one with $fill_up
> >If the users wish to have control over a choice of slots per node,
> >they choose the appropriate PE.
> > 
> >
> If you have applications, where you must be sure to have one or two 
> slots on all of the machine (like the "-mp n" in GAUSS_LFLAGS), you can 
> create two PEs with allocation rule 1 and 2, name them e.g. para1 and 
> para2. This way you can request them by name, and if it doesn't matter, 
> you can request para* and by looking at $PE in the jobscript you  will 
> know, where you ended up.
> By setting them up with $fill_up and $round_robin they will blur the 
> setup over time I think - Reuti

I do this with a set of 4 PEs.  It should be two, but I keep similar speed
systems grouped together.

One of my clusters has 13 boxes (1 head node, 12 compute nodes).  The compute
nodes are of two different speeds, and I don't want to mix MPI jobs between
the nodes.  At the same time, several of the jobs are compiled with Intel's
compiler using the -parallel flag; this makes the programs automatically
parallalize on SMP boxes (which all of these are).  Submitting multi-threaded
jobs usually results in 4 processes per box (two for each slot).   I've found
this to be...sub-optimal...at best. ;-)

The solution has been to create 4 PEs:
	parallel1:       PE for non-threaded jobs on the slow boxes
	threadparallel1: PE for the threaded jobs on the slow boxes
	parallel2:       PE for non-threaded jobs on the fast boxes
	threadparallel2: PE for the threaded jobs on the fast boxes

The queues used by the threadparallel* PE have only a single slot per box, and
this lets the threaded jobs take both CPUs nicely.  I haven't problems (yet)
with over subscription between parallel and threadparallel PEs...  (<knocks on

Jesse Becker
GPG-fingerprint: BD00 7AA4 4483 AFCC 82D0  2720 0083 0931 9A2B 06A2

    [ Part 2, Application/PGP-SIGNATURE 196 bytes. ]
    [ Unable to print this part. ]

More information about the gridengine-users mailing list