[GE dev] Core binding suggestion...

dagru daniel.x.gruber at oracle.com
Tue Nov 16 09:09:43 GMT 2010

Hi Chris, 

(also in response of the OpenMPI discussion) following have to 
be considered in general from SGE/OGE side when using core binding:

- when using the -binding *pe* linear....

This is *only* useful under following conditions:
* jobs are running *exclusively* on the execution hosts (exclusive 
  host feature)
* all hosts have the same architecture 
* fixed allocation rule must be used (allocation_rule N)
Hence it is very limited. The reason is that pe_hostfile 
content is created on the master host and each entry 
contains the same <socket,core> tuples. This makes only 
sense in some special situations and can not be used 
for job isolation. OGE does not make any binding, it just 
creates the content for the MPI. 
If you have different conditions (multiple jobs on hosts), 
different architectures etc. don't use this!

- when using -binding *set* <strategy> or -binding <strategy>

Each involved shepherd is bound according the rule. 
That means linear:2 would result in that on each host 
where the mpi master script connects to with qrsh --inherit 
2 consecutive free cores are chosen. This could be different 
for each host. If one host does not have enough free cores, 
no binding is done but just for this host. 
Because of this *fixed* request (linear:2) it only makes 
sense with "allocation_rule 2". Because with fill_up for 
example the amount of cores you get can be different from 
the amount of slots you get. JSV is the right solution in 
order to ensure this - you did it already ;).

Please also be aware that shepherd childs are bound to 
these 2 core which does *not* mean that each thread 
of the child is running on a different core. It just means 
that the threads are not allowed to be scheduled to different 
cores by OS. 

This should work also when more OGE jobs are running 
on the hosts and when the job spans over hosts with 
different amount of cores.

Hope that helps. 




On 11/01/10 18:10, jewellc wrote:
> Hi,
> Thanks for the info, Daniel.  Sorry for taking so long to get back the list.  I've been away, and then had a bunch of stuff to catch up on!
> So, continuing my quest to get GE to spawn two MPI processes on the same node, bound to different CPUs, I experimented by writing an interminable loop MPI job.  I submitted it to an execution node with the following:
>> qsub -pe mpi 4 -q batch.q at exec2 unterm.com
> with unterm.com:
> #!/bin/bash
> #
> # request Bourne shell as shell for job
> #$ -S /bin/bash
> #
> cd $HOME/mpi
> mpirun unterm
> #
> The result was all 4 processes running on the same core of the exec2 host (showing 25% processor activity per process, as you'd expect).  Attached is the spooldir/exec2/active_jobs/config file.  It looks very much as if 4 "qrsh --inherit" sessions are not being spawned. Is this a bug?  Should OGE be behaving in this way?  Might it be a bug in OpenMPI, and if so can anyone suggest a way to test this?
> Cheers,
> Chris
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=39&dsMessageId=291838
> To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list