[GE users] puzzling MPICH behaviour with GE 5.3

Carlo Nardone Carlo.Nardone at Sun.COM
Mon Aug 23 14:18:59 BST 2004

Hi all,
I'm using GE 5.3p5 with MPICH on a Rocks 3.2.0
cluster of 2x Opteron boxes.

When I launch a parallel job with tight integration
it seems that the node onto which the master MPI
process is launched is not receiving more than one
MPI process, with the result that another node
is loaded with 3 MPI processes rather than 2.
I tested this behaviour with the simplest MPI code
using hostname. Here is my job script:

[cmn at frontend-0]$ cat mpi.sh
#$ -cwd
#$ -j y
#$ -S /bin/sh
echo "mpi.sh on $HOSTNAME machinefile $TMPDIR/machines"
cat $TMPDIR/machines
$MPI_HOME/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines mpiprocs.x

and the following are some of the outputs:

$ qsub -pe mpich 3-3 mpi.sh
$ cat mpi.sh.o308
mpi.sh on compute-0-1.local machinefile /tmp/308.1.compute-0-1.q/machines
Process 1 of 3 on compute-0-3.local
Process 2 of 3 on compute-0-3.local
Process 0 of 3 on compute-0-1.local

$ qsub -pe mpich 8-8 mpi.sh
$ cat mpi.sh.o312
mpi.sh on compute-0-5.local machinefile /tmp/312.1.compute-0-5.q/machines
Process 2 of 8 on compute-0-6.local
Process 3 of 8 on compute-0-7.local
Process 4 of 8 on compute-0-7.local
Process 5 of 8 on compute-0-8.local
Process 6 of 8 on compute-0-8.local
Process 7 of 8 on compute-0-6.local
Process 1 of 8 on compute-0-6.local
Process 0 of 8 on compute-0-5.local

In a case when the cluster was loaded by
a large parallel code except for one node,
I had the following puzzling job output:

$ qsub -pe mpich 2-2 mpi.sh
$ cat mpi.sh.o316
mpi.sh on compute-0-1.local machinefile /tmp/316.1.compute-0-1.q/machines
Could not find enough machines for architecture LINUX

BTW, I wonder why this job aborted rather than
being put in waiting!

Here is my PE definition:

$ qconf -sp mpich
pe_name           mpich
queue_list        all
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args    /opt/gridengine/mpi/stopmpi.sh
allocation_rule   $fill_up
control_slaves    TRUE
job_is_first_task FALSE

and the scheduler configuration:

$ qconf -ssconf
algorithm                  default
schedule_interval          0:0:4
maxujobs                   0
queue_sort_method          share
user_sort                  false
job_load_adjustments       np_load_avg=0.0
load_adjustment_decay_time 0:0:0
load_formula               np_load_avg
schedd_job_info            true
sgeee_schedule_interval    0:2:0
halftime                   168
usage_weight_list          cpu=1,mem=0,io=0
compensation_factor        5
weight_user                0.2
weight_project             0.2
weight_jobclass            0.2
weight_department          0.2
weight_job                 0.2
weight_tickets_functional  0
weight_tickets_share       0
weight_tickets_deadline    0

finally, here is a sample of queue definition
(BTW the association between queue name
and hostname is correct for all queues):

$ qconf -sq compute-0-0.q
qname                compute-0-0.q
hostname             compute-0-0
seq_no               0
load_thresholds      np_load_avg=1.75
suspend_thresholds   NONE
nsuspend             1
suspend_interval     00:05:00
priority             0
min_cpu_interval     00:05:00
processors           UNDEFINED
rerun                FALSE
slots                2
tmpdir               /tmp
shell                /bin/csh
shell_start_mode     NONE
prolog               NONE
epilog               NONE
starter_method       NONE
suspend_method       NONE
resume_method        NONE
terminate_method     NONE
notify               00:00:60
owner_list           NONE
user_lists           NONE
xuser_lists          NONE
subordinate_list     NONE
complex_list         NONE
complex_values       NONE
projects             NONE
xprojects            NONE
calendar             NONE
initial_state        default
fshare               0
oticket              0
s_rt                 INFINITY
h_rt                 INFINITY
s_cpu                INFINITY
h_cpu                INFINITY
s_fsize              INFINITY
h_fsize              INFINITY
s_data               INFINITY
h_data               INFINITY
s_stack              INFINITY
h_stack              INFINITY
s_core               INFINITY
h_core               INFINITY
s_rss                INFINITY
h_rss                INFINITY
s_vmem               INFINITY
h_vmem               INFINITY

Thank you very much for any hints/suggestions.

Carlo Nardone                   Sun Microsystems Italia SpA
Technical Systems Ambassador    Client Services Organization
Grid and HPTC Specialist        Practice Data Center - Platform Design

Tel. +39 06 36708 024           via G. Romagnosi, 4
Fax. +39 06 3221969             I-00196 Roma
Mob. +39 335 5828197            Italy
Email: carlo.nardone at sun.com
"From nothing to more than nothing."
(Brian Eno & Peter Schmidt, _Oblique Strategies_)

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list