[GE users] SGE + MVAPICH2 Loose Integration

Sangamesh B forum.san at gmail.com
Fri Sep 5 13:51:08 BST 2008

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi All,

      The cluster has 33 nodes (Quad core, Dual processor) with Mellanox
infiniband hardware.

The compute nodes have the IP addresses as follows:

Ethernet port:    compute-0-0.local compute-0-0 c0-0    compute-0-1.local compute-0-1 c0-1
...    compute-0-31.local compute-0-31 c0-31

Infiniband Port:    ibc0    ibc1
...    ibc31

During a parallel job submission I used PE=mpich2 (which is made for mpich2)
# qconf -sp mpich2
pe_name           mpich2
slots             9999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /opt/gridengine/mpi/startmpi.sh $pe_hostfile
stop_proc_args    /opt/gridengine/mpi/stopmpi.sh
allocation_rule   $fill_up
control_slaves    FALSE
job_is_first_task TRUE
urgency_slots     min

The SGE script is as follows:

#$ -q all.q
#$ -cwd
#$ -e Err.$JOB_NAME.$JOB_ID
#$ -o Out.$JOB_NAME.$JOB_ID
#$ -pe mpich2 16

/data/mvapich2_intel/bin/mpirun  -machinefile  $TMPDIR/machines  -np
$NSLOTS     /data/apps/namd26_mvapich2/Linux-mvapich2/namd2

It didn't run, and gave following error:

$ cat Out.NAMD_SGE_PRL.22
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
mpiexec: unable to start all procs; may have invalid machine names
    remaining specified hosts: (compute-0-31.local) (compute-0-15.local)

The error is obvious, becaouse MVAPICH2's mpdboot is done through IB
interface ip addresses(173.xx.x.xx series). Since the PE is mpich2, the
startmpi.sh script is preparing machinefile based on Ethernet hostnames.

Then I followed document: MVAPICH Integration with SGE at:


But this document doesn't apply to MVAPICH2, as there is no mpirun_rsh, etc

Anyone on the list has the solution for it? What all things to be changed in
startmpi.sh script?

Thank you,

More information about the gridengine-users mailing list