[GE users] Tight Integration of the mpd startup method

gqc606 gqc606 at hotmail.com
Wed May 26 15:46:41 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi?
  I compiled MPICH2 after I configured it,attached the PE to the cluster queue of my choice and to adjusted the path to my MPICH installation successfully.Do exactly as the page shows:

http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html 
http://marc.info/?l=npaci-rocks-discussion&m=127481216829722&w=2
 

According to "http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMess \
ageId=257043", Reuti says that I have to also edit one line of the  provided \
"startmpich2.sh" script to make it work correctly with Rocks:

   # vi $SGE_ROOT/mpich2_mpd/startmpich2.sh

   Jump down to line 176 where it says:

   NODE=`hostname`

   and change it to:

   NODE=`hostname --short`
And I have done it. But it is strange,when I submitted my script,the following errors occurs:
 
-catch_rsh /opt/gridengine/default/spool/compute-0-1/active_jobs/240.1/pe_hostfile /opt/mpich2/gnu
compute-0-1:3
compute-0-0:3
usage: start_mpich2 [-n <hostname>] mpich2-mpd-path [mpd-parameters ..]
where: 'hostname' gives the name of the target host
usage: start_mpich2 [-n <hostname>] mpich2-mpd-path [mpd-parameters ..]
where: 'hostname' gives the name of the target host
startmpich2.sh: check for mpd daemons (1 of 10)
startmpich2.sh: check for mpd daemons (2 of 10)
startmpich2.sh: check for mpd daemons (3 of 10)
startmpich2.sh: check for mpd daemons (4 of 10)
startmpich2.sh: check for mpd daemons (5 of 10)
startmpich2.sh: check for mpd daemons (6 of 10)
startmpich2.sh: check for mpd daemons (7 of 10)
startmpich2.sh: check for mpd daemons (8 of 10)
startmpich2.sh: check for mpd daemons (9 of 10)
startmpich2.sh: check for mpd daemons (10 of 10)
startmpich2.sh: got only 8 of 2 nodes, aborting
-catch_rsh /opt/mpich2/gnu
mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_test_sge_240.undefined); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
In case 1, you can start an mpd on this host with:
    mpd &
and you will be able to run jobs just on this host.
For more details on starting mpds on a set of hosts, see
the MPICH2 Installation Guide.
 
 
Here is my script:
#!/bin/sh
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -N flat_airebo
#$ -pe mpich2_mpd 6
#$ -q all.q
#$ -e error.out
#$ -o screen.out
#
export MPICH2_ROOT=/opt/mpich2/gnu
export PATH=$MPICH2_ROOT/bin:$PATH
export MPD_CON_EXT="sge_$JOB_ID.$SGE_TASK_ID"
/opt/mpich2/gnu/bin/mpiexec -machinefile $TMPDIR/machines -n $NSLOTS /home/test/mpi-ring
exit 0

 I was confused,it shouldn't occur such a mistake .who can give me some advices?Thanks!

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258689

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list