[GE users] Resource allocation strategies

craffi dag at sonsorol.org
Fri Dec 12 12:14:49 GMT 2008

Hi Frank,

Short answers inline ...

On Dec 12, 2008, at 6:43 AM, Frank Olaf Sem-Jacobsen wrote:

> Dear community,
> I have spent a few hours going through various documents and the  
> mailing
> list archive to figure out an answer to the following question.
> How are the individual processors/nodes selected for allocation to a
> specific job? I know you can specify that you want certain types of
> architectures, a specific amount of memory, and so forth, but given  
> that
> there are a number of nodes that satisfy these requirements, how is  
> one
> or more specifically chosen

SGE works to find the best possible remote host to run your job on.

When more than one host meets this criteria the default behavior is to  
sort the available host list by load average so that the end result in  
default mode is that "sge will run your job on the least busy of the  
most suitable machines"

You can override that final (sort on load) manually by assigning nodes  
an integer based "sequence number". Then the final subsort is done on  
your custom integers. This is one way to influence node selection.

> The reason for asking is that I would like to exploit locality in the
> processor allocation. This means that if I need 10 processors/nodes I
> would like them to be physically close to each other (with a short  
> path
> between them through the network). For instance, in a fat tree or a  
> mesh
> topology it is clearly defined which are the nearest nodes to each  
> other.
> Is this in any way supported, or are available nodes chosen more or  
> less
> at random from a list?

Topology aware scheduling is possible using wildcard selectors on  
parallel environments or hostgroups. It goes something like this:

Assume you have multiple racks of servers; each server is connected to  
an in-cabinet aggregation switch.

You want to keep your parallel jobs within a single cabinet because  
that means the application traffic never needs to leave the in-cabinet  
switch backplane.

This is done by:

(a) creating an SGE parallel environments to reflect the topology  
units (MPICH1, MPICH2, MPICH3, etc.)

(b) Submitting the job using a wildcard selector:

     $ qsub -pe MPICH* -np 32 ./my-parallel-application

It's a bit cleaner to do this with PEs because you can further control  
the dispersal of tasks

You can also do it with physical hostgroups:

(a) Make hostgroups named by topology RACK1, RACK2, RACK3 etc.

(b) submit to a particular hostgroup set:

     $ qsub -q all.q@@RACK*

I believe that above example would preferentially pack your job within  
a cabinet (it's early AM here and I'm still thinking fuzzy ... )


> Any feedback is greatly appreciated, and if there is a document that
> describes this please let me know.
> Sincerely,
> -- 
> Frank Olaf Sem-Jacobsen
> Ph.D.
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92365
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list