[GE users] mpich2 configuration
mh613 at york.ac.uk
Tue Nov 3 13:53:08 GMT 2009
I'm having problems getting my mvapich2 (an implementation of mpich2)
working via SGE using tight integration.
I'm using the MPD method and everything seems to work ok when I run
./mpdboot -n 18 -f hostfile
<shows list of machines>
./mpirun -machinefile hostfile -n 18 ./cpi
Runs and shows the expected output, in addition I can see the processes
running on each individual node.
I've set up a parallel environment for it as per reuti's explanation page:
start_proc_args /opt/n1ge6/mpich2_mpd/startmpich2.sh -catch_rsh \
stop_proc_args /opt/n1ge6/mpich2_mpd/stopmpich2.sh -catch_rsh \
Plus I've compiled the helper applications with ./aimk; ./install.sh
which worked fine.
Now I come to submit a job to the SGE cluster this is my submission file
#$ -N MVAPICH2
#$ -pe mpich2_mpd 1
#$ -l mpi_htx2=true ## This is just because only some machines have
working IB cards
echo "Got $NSLOTS slots."
# The order of arguments is important. Forst global, then local options.
mpiexec -machinefile $TMPDIR/machines -n $NSLOTS ~/cpilog
I then submit the job using qsub mpich2_mpd.sh
I get back a .pe file with the following contents:
usage: start_mpich2 [-n <hostname>] mpich2-mpd-path [mpd-parameters ..]
where: 'hostname' gives the name of the target host
Host key verification failed.
Sometimes I don't get the usage error for some reason, but I always get
"Host key verificiation failed". I've checked and SSH is enabled without
password between all MPI hosts, plus the MVAPICH1 installation we have
on SGE works.
Does anyone have any ideas what configuration errors I have here? I'm
fairly sure it's a configuration error with SGE rather than my
MPI/MVAPICH installation as everything works ok when I run things
outside of SGE.
Many thanks for your help.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users