[GE users] mpich <-> sge --> controlling hosts machinefile

Gerolf Ziegenhain ziegen at rhrk.uni-kl.de
Wed Jul 4 19:50:18 BST 2007


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Maybe it is a very stupid question, but: How do I control the number of jobs
per node? Consider the following hardware: 38 nodes with two processors on
each. When I start a job with -pe mpich 8 there should be 4 nodes used with
2 jobs on each. What do I have to do in order to achieve this?

My parallel environment is configured like this:
qconf -sp mpich
pe_name           mpich
slots             60
user_lists        NONE
xuser_lists       NONE
start_proc_args   /opt/N1GE/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args    /opt/N1GE/mpi/stopmpi.sh
allocation_rule   $fill_up
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

My mpich-queue has limits:
np_load_av=1
np_load_sh=1
n_slots=2

However if I start a job, something like this will happen in the
PI1234-file:
lc12.rhrk.uni-kl.de 0 prog
lc19 1 prog
lc19 1 prog
lc19 1 prog
lc14 1 prog
lc14 1 prog
lc13 1 prog
lc13 1 prog

So there are particularly three jobs on lc19 with only two CPUs, On of these
three jobs would better be running on lc12. How can I fix this?


Thanks in advance:
   Gerolf




-- 
Dipl. Phys. Gerolf Ziegenhain
Office: Room 46-332 - Erwin-Schrödinger-Str.46 - TU Kaiserslautern - Germany
Web: gerolf.ziegenhain.com



More information about the gridengine-users mailing list