[GE users] mpich2 configuration

markhewitt mh613 at york.ac.uk
Tue Nov 3 15:08:57 GMT 2009


Nevermind, it's all working properly now! (The usual way things happen!)


markhewitt wrote:
> I'm having problems getting my mvapich2 (an implementation of mpich2) 
> working via SGE using tight integration.
>
> I'm using the MPD method and everything seems to work ok when I run 
> things manually
> e.g.
> ./mpdboot -n 18 -f hostfile
> ./mpdtrace
>  <shows list of machines>
> ./mpirun -machinefile hostfile -n 18 ./cpi
> Runs and shows the expected output, in addition I can see the processes 
> running on each individual node.
>
> I've set up a parallel environment for it as per reuti's explanation page:
> pe_name            mpich2_mpd
> slots              8
> user_lists         NONE
> xuser_lists        NONE
> start_proc_args    /opt/n1ge6/mpich2_mpd/startmpich2.sh -catch_rsh \
>                   $pe_hostfile /wrg/software/SL4.x86-64/mvapich2
> stop_proc_args     /opt/n1ge6/mpich2_mpd/stopmpich2.sh -catch_rsh \
>                   /wrg/software/SL4.x86-64/mvapich2/
> allocation_rule    $fill_up
> control_slaves     TRUE
> job_is_first_task  FALSE
> urgency_slots      min
> accounting_summary FALSE
>
> Plus I've compiled the helper applications with ./aimk; ./install.sh 
> which worked fine.
>
> Now I come to submit a job to the SGE cluster this is my submission file 
> (mpich2_mpd.sh)
>
>
> #$ -N MVAPICH2
> #$ -cwd
> #$ -V
> #$ -pe mpich2_mpd 1
> #$ -l mpi_htx2=true  ## This is just because only some machines have 
> working IB cards
>
> export MPICH2_ROOT=/wrg/software/SL4.x86_64/mvapich2
> export PATH=$MPICH2_ROOT/bin:$PATH
> export MPD_CON_EXT="sge_$JOB_ID.$SGE_TASK_ID"
>
> echo "Got $NSLOTS slots."
> # The order of arguments is important. Forst global, then local options.
> mpiexec -machinefile $TMPDIR/machines -n $NSLOTS ~/cpilog
> exit 0
>
>
> I then submit the job using qsub mpich2_mpd.sh
>
> I get back a .pe file with the following contents:
> usage: start_mpich2 [-n <hostname>] mpich2-mpd-path [mpd-parameters ..]
>
> where: 'hostname' gives the name of the target host
> Host key verification failed.
>
> Sometimes I don't get the usage error for some reason, but I always get 
> "Host key verificiation failed". I've checked and SSH is enabled without 
> password between all MPI hosts, plus the MVAPICH1 installation we have 
> on SGE works.
>
> Does anyone have any ideas what configuration errors I have here? I'm 
> fairly sure it's a configuration error with SGE rather than my 
> MPI/MVAPICH installation as everything works ok when I run things 
> outside of SGE.
>
> Many thanks for your help.
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224816
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224835

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list