[GE users] Problem with the mpich2 sge integration for an mpiblast run

Matthias Neder matthias.neder at gmail.com
Fri Apr 11 11:29:22 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi List,

i still have some problems with the integration of mpich2 to sge.

First what i installed:
-Installed sge 6.0
-Installed mpich2-1.0.7rc1.tar.gz like described here:
http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html
the "*Tight Integration of the daemonless smpd startup method**"
*(also tried this version, but i will talk about it later.
http://faq.bioteam.net/index.php?action=artikel&cat=5&id=76&artlang=en)

Here is my pe:
========================
[14:52:43-root at HeadNode sge-root]# qconf -sp mpich2_smpd_rsh
pe_name           mpich2_smpd_rsh
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /opt/sge-root/mpich2_smpd_rsh/startmpich2.sh -catch_rsh  \
                  $pe_hostfile
stop_proc_args    /opt/sge-root/mpich2_smpd_rsh/stopmpich2.sh
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

========================

the startmpich2.sh
##########################
#!/bin/sh
#
#
# (c) 2002 Sun Microsystems, Inc. Use is subject to license terms.

#
# preparation of the mpi machine file
#
# usage: startmpi.sh [options] <pe_hostfile>
#
#        options are:
#                     -catch_hostname
#                      force use of hostname wrapper in $TMPDIR when
starting mpirun
#                     -catch_rsh
#                      force use of rsh wrapper in $TMPDIR when starting
mpirun
#                     -unique
#                      generate a machinefile where each hostname appears
only once
#                      This is needed to setup a multithreaded mpi
application
#

PeHostfile2MachineFile()
{
   cat $1 | while read line; do
      # echo $line
      host=`echo $line|cut -f1 -d" "|cut -f1 -d"."`
      nslots=`echo $line|cut -f2 -d" "`
      i=1
      while [ $i -le $nslots ]; do
         # add here code to map regular hostnames into ATM hostnames
         echo $host
         i=`expr $i + 1`
      done
   done
}


#
# startup of MPI conforming with the Grid Engine
# Parallel Environment interface
#
# on success the job will find a machine-file in $TMPDIR/machines
#

# useful to control parameters passed to us
echo $*

# parse options
catch_rsh=0
catch_hostname=0
unique=0
while [ "$1" != "" ]; do
   case "$1" in
      -catch_rsh)
         catch_rsh=1
         ;;
      -catch_hostname)
         catch_hostname=1
         ;;
      -unique)
         unique=1
         ;;
      *)
         break;
         ;;
   esac
   shift
done

me=`basename $0`

# test number of args
if [ $# -ne 1 ]; then
   echo "$me: got wrong number of arguments" >&2
   exit 1
fi

# get arguments
pe_hostfile=$1

# ensure pe_hostfile is readable
if [ ! -r $pe_hostfile ]; then
   echo "$me: can't read $pe_hostfile" >&2
   exit 1
fi

# create machine-file
# remove column with number of slots per queue
# mpi does not support them in this form
machines="$TMPDIR/machines"

if [ $unique = 1 ]; then
   PeHostfile2MachineFile $pe_hostfile | uniq >> $machines
else
   PeHostfile2MachineFile $pe_hostfile >> $machines
fi

# trace machines file
cat $machines

#
# Make script wrapper for 'rsh' available in jobs tmp dir
#
if [ $catch_rsh = 1 ]; then
   rsh_wrapper=$SGE_ROOT/mpich2_smpd_rsh/rsh
   if [ ! -x $rsh_wrapper ]; then
      echo "$me: can't execute $rsh_wrapper" >&2
      echo "     maybe it resides at a file system not available at this
machine" >&2
      exit 1
   fi

   rshcmd=rsh
   case "$ARC" in
      hp|hp10|hp11|hp11-64) rshcmd=remsh ;;
      *) ;;
   esac
   # note: This could also be done using rcp, ftp or s.th.
   #       else. We use a symbolic link since it is the
   #       cheapest in case of a shared filesystem
   #
   ln -s $rsh_wrapper $TMPDIR/$rshcmd
fi

#
# Make script wrapper for 'hostname' available in jobs tmp dir
#
if [ $catch_hostname = 1 ]; then
   hostname_wrapper=$SGE_ROOT/mpich2_smpd_rsh/hostname
   if [ ! -x $hostname_wrapper ]; then
      echo "$me: can't execute $hostname_wrapper" >&2
      echo "     maybe it resides at a file system not available at this
machine" >&2
      exit 1
   fi

   # note: This could also be done using rcp, ftp or s.th.
   #       else. We use a symbolic link since it is the
   #       cheapest in case of a shared filesystem
   #
   ln -s $hostname_wrapper $TMPDIR/hostname
fi

# signal success to caller
exit 0
##################################################################

if i ran a command i got this:
############################
[12:04:16-root at HeadNode mpiblast]# qrsh -pe mpich2_smpd_rsh 4  mpiexec -n 4
/opt/sge-root/mpich2/examples/cpi
Process 0 of 1 is on Node-192-168-60-171
Process 0 of 1 is on Node-192-168-60-173
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000235
Process 0 of 1 is on Node-192-168-60-169
Process 0 of 1 is on Node-192-168-60-172
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000302
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000248
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.000233
[12:04:53-root at HeadNode mpiblast]#

####################
So the job is send to all nodes, at the same time.

I can ran the cpi with commands:
mpdboot --ncpus=2 -n 25 -v -f /opt/sge-root/mpiblast/allhostlist &&
/opt/sge-root/mpich2/bin/mpiexec -n 24 ../mpich2/examples/cpi &&mpdallexit
The output looks good. pi is calculated though all nodes.

I also tried the *Tight Integration of the daemon-based smpd startup method
*Version with the pe:
######################
[12:23:27-root at HeadNode mpiblast]# qconf -sp mpich2_smpd
pe_name           mpich2_smpd
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /opt/sge-root/mpich2_smpd/startmpich2.sh -catch_rsh \
                  $pe_hostfile /opt/sge-root/mpich2_smpd
stop_proc_args    /opt/sge-root/mpich2_smpd/stopmpich2.sh -catch_rsh \
                  /opt/sge-root/mpich2_smpd
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min
######################
If i start the pe with i got these:
##########################
[12:23:56-root at HeadNode mpiblast]# qrsh -pe mpich2_smpd 4
/opt/sge-root/mpich2_smpd/bin/mpiexec -n 4 /opt/sge-root/mpich2/examples/cpi
op_connect error: socket connection failed, error stack:
MPIDU_Socki_handle_connect(791): connection failure
(set=1,sock=16777216,errno=111:Connection refused)
unable to connect mpiexec tree, socket connection failed, error stack:
MPIDU_Socki_handle_connect(791): connection failure
(set=1,sock=16777216,errno=111:Connection refused).
[12:24:30-root at HeadNode mpiblast]#
###################

So i am a bit confused.

I get the mpd ring runnin and got the cpi runnin in the ring. But the
integration failed in both ways, one time it starts the cpi on every node,
the other time it failed completly.
Someone got an idea for me? Or better which way of integration should i use
for the mpiblast? Which is the best one for the mpiblast?

Thx in advance.
Matthias



More information about the gridengine-users mailing list