[GE dev] Core binding suggestion...
daniel.x.gruber at oracle.com
Tue Nov 16 09:09:43 GMT 2010
(also in response of the OpenMPI discussion) following have to
be considered in general from SGE/OGE side when using core binding:
- when using the -binding *pe* linear....
This is *only* useful under following conditions:
* jobs are running *exclusively* on the execution hosts (exclusive
* all hosts have the same architecture
* fixed allocation rule must be used (allocation_rule N)
Hence it is very limited. The reason is that pe_hostfile
content is created on the master host and each entry
contains the same <socket,core> tuples. This makes only
sense in some special situations and can not be used
for job isolation. OGE does not make any binding, it just
creates the content for the MPI.
If you have different conditions (multiple jobs on hosts),
different architectures etc. don't use this!
- when using -binding *set* <strategy> or -binding <strategy>
Each involved shepherd is bound according the rule.
That means linear:2 would result in that on each host
where the mpi master script connects to with qrsh --inherit
2 consecutive free cores are chosen. This could be different
for each host. If one host does not have enough free cores,
no binding is done but just for this host.
Because of this *fixed* request (linear:2) it only makes
sense with "allocation_rule 2". Because with fill_up for
example the amount of cores you get can be different from
the amount of slots you get. JSV is the right solution in
order to ensure this - you did it already ;).
Please also be aware that shepherd childs are bound to
these 2 core which does *not* mean that each thread
of the child is running on a different core. It just means
that the threads are not allowed to be scheduled to different
cores by OS.
This should work also when more OGE jobs are running
on the hosts and when the job spans over hosts with
different amount of cores.
Hope that helps.
On 11/01/10 18:10, jewellc wrote:
> Thanks for the info, Daniel. Sorry for taking so long to get back the list. I've been away, and then had a bunch of stuff to catch up on!
> So, continuing my quest to get GE to spawn two MPI processes on the same node, bound to different CPUs, I experimented by writing an interminable loop MPI job. I submitted it to an execution node with the following:
>> qsub -pe mpi 4 -q batch.q at exec2 unterm.com
> with unterm.com:
> # request Bourne shell as shell for job
> #$ -S /bin/bash
> cd $HOME/mpi
> mpirun unterm
> The result was all 4 processes running on the same core of the exec2 host (showing 25% processor activity per process, as you'd expect). Attached is the spooldir/exec2/active_jobs/config file. It looks very much as if 4 "qrsh --inherit" sessions are not being spawned. Is this a bug? Should OGE be behaving in this way? Might it be a bug in OpenMPI, and if so can anyone suggest a way to test this?
> To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].
To unsubscribe from this discussion, e-mail: [dev-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users