[GE users] Integration of the MPICH2 and SGE

gqc606 gqc606 at hotmail.com
Sat May 15 15:06:06 BST 2010

  Hello,I installed Rocks 5.3 on my computers,I would like to use SGE to manage my MPICH2.In this system,it use the daemonless smpd to startup MPICH2.
[test at cluster ~]$ qconf -sp mpich 
pe_name            mpich
slots              9999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args     /opt/gridengine/mpi/stopmpi.sh
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE

This is my script:
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -N flat_airebo
#$ -pe mpich 6
#$ -q all.q
#$ -e error.out
#$ -o screen.out

export MPICH2_ROOT=/opt/mpich2/gnu
export PATH=$MPICH2_ROOT/bin:$PATH
export MPIEXEC_RSH=rsh

mpiexec -rsh -nopm -n $NSLOTS -machinefile $TMPDIR/machines /home/test/mpi-ring

But when I submit my script, the following error occurs:
-catch_rsh /opt/gridengine/default/spool/compute-0-1/active_jobs/179.1/pe_hostfile
mpiexec_compute-0-1.local: cannot connect to local mpd (/tmp/mpd2.console_test); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
In case 1, you can start an mpd on this host with:
    mpd &
and you will be able to run jobs just on this host.
For more details on starting mpds on a set of hosts, see
the MPICH2 Installation Guide.

I don't know where the error occurred.I read the page <http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html> for several times,and didn't find the error.Who can give me some advices?Thanks!


