[GE users] MPI jobs can run without specifying PE with number of slots

Reuti reuti at staff.uni-marburg.de
Sun Nov 2 17:05:18 GMT 2008


Hey Einat,

Am 02.11.2008 um 15:10 schrieb einat at post.tau.ac.il:

> We have Grid Engine 6.0u8 installed on our cluster (which was  
> installed with Rocks).
>
> Basically - I want users that don't specify any PE (on the qsub  
> command, or in their script) - to get one CPU only.

correct.

> I think it should be the default behaviour of SGE, but it's not the  
> case on our cluster - anybody can run an MPI job and get any number  
> of CPUs on the same node without specifying it on 'qsub' command line.

Also correct.

For now it's not foreseen, that SGE binds the user's job to a certain  
core, although it might be implemented in the future:

http://gridengine.sunsource.net/servlets/ReadMsg?listName=dev&msgNo=3297

For now the problem seems to be, that any user can use "taskset" in  
Linux for his own processes and assign any cores he likes. There is a  
super-user-only taskset command missing.

A simple proof-of-concept you can find here: http://groups.grid.org/ 
content/simple-processorcore-affinity-sgeunicluster But they are not  
checking which cores are already in use. (Hidden in /proc there is a  
setting "Cpus_allowed" which could be scanned. But it would be  
simpler when SGE just records the used cores.)

(As long as there were only single-core nodes in clusters, this was  
not an issue. Even the "processors" setting for some architectures in  
the past was not per job, but per jobs in this queue per node.)

> I have run test MPI jobs with 4 slots without specifying it on the  
> qsub command. That's exactly the situation I want to prevent from  
> our users. I want SGE to be aware of the resources they are going  
> to use.
>
> My test script:
>
> #!/bin/tcsh -f
> mpd &
> mpiexec -n 4 /usr/local/src/mpich2-1.0.6p1/examples/cpi >& out
> mpdallexit

This is not a Tight Integration of a parallel application into SGE.  
You can find a Howto here: http://gridengine.sunsource.net/howto/ 
mpich2-integration/mpich2-integration.html

(MPICH2 1.0.6 is broken for the daemonless startup, you will need at  
least 1.0.7.)

-- Reuti


>
>
> Thanks for your help,
> Einat Bielopolski
> Computing Division
> Tel-Aviv University
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list