[GE users] mpich/sge, 2, and 4 cpu machine

Jerry Mersel jerry.mersel at weizmann.ac.il
Mon Aug 7 07:20:36 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi:

  I have machines connected to a grid, some have 2 cpu's and others
  have 4 cpu's (really 2 dual-core).
  I have mpich and SGE working together using tight integration, and with the
  machines with 2 cpu's everything is working well. With the machines that
   have 4 cpu's some work and some don't.

  On the one's that don't I am getting :

Could not find enough machines for architecture LINUX

  If I add -nolocal to the mpirun command it runs, but then I loose
  the tight integration.

  Also if I run without SGE, ask for 4 cpu's and list only 1, 4 cpu
machine in
  share/machines.LINUX  mpich will run 3 processes on the remote machine
  and 1 on the local machine (a 2 cpu machine without sgeexecd running).

  Here is the script: (I removed the complete path for clearity)

#!/bin/bash
#$ -o mpi_mpich.stdout
#$ -e mpi_mpich.stderr
#$ -pe mpich 4
#$ -cwd
#$ -S /bin/bash

#$ -v LD_LIBRARY_PATH=/usr/local/mpich/lib:/usr/lib:/usr/local/lib
#$ -v LD_LIBRARY_PATH_64=/usr/lib/sparcv9:/usr/local/lib/sparcv9
export MPICH_PROCESS_GROUP=no
export RSHCOMMAND=/shared/SGE/utilbin/lx24-amd64/rsh
export P4_RSHCOMMAND=/shared/SGE/utilbin/lx24-amd64/rsh

echo "Got $NSLOTS slots."
echo "Tempdir is $TMPDIR."
cat  "$TMPDIR/machines"

/shared/mpich/bin/mpirun -np 4  -machinefile $TMPDIR/machines mitgcmuv

Please help.


                                 Regards,
                                   Jerry



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list