[GE users] Integration of the MPICH2 and SGE

stephane_vaillant stephane.vaillant at imcce.fr
Wed May 26 16:21:22 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

gqc606 wrote:
> Hi?
>   I compiled MPICH2 after I configured it,attached the PE to the cluster queue of my choice and to adjusted the path to my MPICH installation successfully.Do exactly as the page shows:
> 
> http://gridengine.su?nsource.net/howto/mp?ich2-integration/mpi?ch2-integration.html? 
> http://marc.info/?l=?npaci-rocks-discussi?on?&m=127481216829722&w=2
>  
> 
> According to "http://gridengine.su?nsource.net/ds/viewM?essage.do?dsForumId=?38&dsMess \
> ageId=257043", Reuti says that I have to also edit one line of the provided \
> "startmpich2.sh" script to make it work correctly with Rocks:
> 
>    # vi $SGE_ROOT/mpich2_mp?d/startmpich2.sh
> 
>    Jump down to line 176 where it says:
> 
>    NODE=`hostname`
> 
>    and change it to:
> 
>    NODE=`hostname --short`


I've patched startmpich2.sh with the following command

sed -i -e 's/.*host=.*cut.*cut.*/host=`echo $line|cut -f1 -d" "`/' 
$INSTALLDIR/mpich2_mpd/startmpich2.sh

But I think the method of Reuti is much simpler.


> And I have done it. But it is strange,when I submitted my script,the following errors occurs:
>  
> -catch_rsh /opt/gridengine/defa?ult/spool/compute-0-?1/active_jobs/240.1/?pe_hostfile /opt/mpich2/gnu
> compute-0-1:3
> compute-0-0:3
> usage: start_mpich2 [-n <hostname>] mpich2-mpd-path [mpd-parameters ..]
> where: 'hostname' gives the name of the target host
> usage: start_mpich2 [-n <hostname>] mpich2-mpd-path [mpd-parameters ..]
> where: 'hostname' gives the name of the target host
> startmpich2.sh: check for mpd daemons (1 of 10)
> startmpich2.sh: check for mpd daemons (2 of 10)
> startmpich2.sh: check for mpd daemons (3 of 10)
> startmpich2.sh: check for mpd daemons (4 of 10)
> startmpich2.sh: check for mpd daemons (5 of 10)
> startmpich2.sh: check for mpd daemons (6 of 10)
> startmpich2.sh: check for mpd daemons (7 of 10)
> startmpich2.sh: check for mpd daemons (8 of 10)
> startmpich2.sh: check for mpd daemons (9 of 10)
> startmpich2.sh: check for mpd daemons (10 of 10)
> startmpich2.sh: got only 8 of 2 nodes, aborting
> -catch_rsh /opt/mpich2/gnu
> mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_t?est_sge_240.undefine?d); possible causes:
>   1. no mpd is running on this host
>   2. an mpd is running but was started without a "console" (-n option)
> In case 1, you can start an mpd on this host with:
>     mpd &
> and you will be able to run jobs just on this host.
> For more details on starting mpds on a set of hosts, see
> the MPICH2 Installation Guide.
>  

I have built the parallel environment with the following script :

cat > template.tmp <<EOT
pe_name            mpich2_mpd
slots              999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /opt/sge/mpich2_mpd/startmpich2.sh -catch_rsh 
\$pe_hostfile /usr/local
stop_proc_args     /opt/sge/mpich2_mpd/stopmpich2.sh -catch_rsh /usr/local
allocation_rule    \$round_robin
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary TRUE
EOT
qconf -Ap template.tmp
rm -f template.tmp

Note: mpd is installed in /usr/local/bin/

Do you have similar values for start_proc_args and stop_proc_args ?

>  
> Here is my script:
> #!/bin/sh
> #
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> #$ -N flat_airebo
> #$ -pe mpich2_mpd 6
> #$ -q all.q
> #$ -e error.out
> #$ -o screen.out
> #
> export MPICH2_ROOT=/opt/mpich2/gnu
> export PATH=$MPICH2_ROOT/bin:$PATH
> export MPD_CON_EXT="sge_$J?OB_ID.$SGE_TASK_ID"?
> /opt/mpich2/gnu/bin/mpiexec -machinefile $TMPDIR/machines -n $NSLOTS /home/test/mpi-ring
> exit 0
> 
>  I was confused,it shouldn't occur such a mistake .who can give me some advices?Thanks!
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258691
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


-- 
Stéphane VAILLANT

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258699

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list