[GE users] PVM, SSh and vendor-specific host file

Reuti reuti at staff.uni-marburg.de
Tue Oct 3 18:03:42 BST 2006


Hi,

Am 03.10.2006 um 18:23 schrieb Bisbal, Prentice:

> I modified my startpvm.sh script as recommened by Reuti (see below,  
> and sorry for top-posting, but I'm forced to use outlook). I  
> borrowed from the startmpi.sh script to create this function:
>
> PeHostfile2OpenEyeHostFile()
> {
>    myname=`hostname`
>    cat $1 | while read line; do
>       # echo $line
>       host=`echo $line|cut -f1 -d" "`
>       nslots=`echo $line|cut -f2 -d" "`
>       i=1
>       if [ "$host" = "$myname" ]; then
>           if [ $nslots -eq 1 ]; then
>               continue
>           elif [ $nslots -gt 1 ]; then
>               nslots=`expr $nslots - 1`
>           fi
>       fi
>       echo "host $host $nslots"
>    done
> }
>
> This function is called later in the script like this (again  
> mimicking startmpi.sh):
>
> oe_hosts="$TMPDIR/oe_hosts"
> PeHostfile2OpenEyeHostFile $pe_hostfile >> $oe_hosts
>
> I've attached a patchfile containing these changes. This works  
> exactly as desired when my PVM PE is configured for loose  
> integration, as described in
> http://gridengine.sunsource.net/howto/pvm-integration/pvm- 
> integration.html
>
> However, when I switch my PVM PE configuration to tight  
> integration,it works fine on my Linux execution hosts, but fails on  
> my IRIX 6.5 hosts. I get the following errors in my .pe### file:
>
> $ more tester_tight.sh.pe535
> [pvmd pid257920] 10/03 11:15:29 usage: pvmd3 [-ddebugmask] [- 
> nhostname] [hostfil
> e]
> [pvmd pid257920] 10/03 11:15:29 pvmbailout(0)
> libpvm [pid1066678] /tmp/535.1.all.q/pvmd.2500: No such file or  
> directory
> libpvm [pid1066678] /tmp/535.1.all.q/pvmd.2500: No such file or  
> directory
> libpvm [pid1066678] /tmp/535.1.all.q/pvmd.2500: No such file or  
> directory
> libpvm [pid1066678]: pvm_mytid(): Can't contact local daemon
>
> My job script looks like this:
>
> #!/bin/sh
> PVM_ROOT=/usr/local/share/pvm3
> PVM_ARCH=`$PVM_ROOT/lib/pvmgetarch`
>
> PVM_TMP=$TMPDIR
> export PVM_TMP
>
> $PVM_ROOT/bin/$PVM_ARCH/omega -in XXXXXXX.in -out XXXXXXX.out -pvm
> conf $TMPDIR/oe_hosts
>
> I did some investigating, and I noticed that the PVM temp dirs do  
> not get created in $TMPDIR. Any idea why this works for Linux, but  
> not IRIX? Again, loose integration works fine. Both architectures  
> are using the same version of PVM compiled/configured the same way.  
> I'm using SSH instead of RSH. Both architectures are using the same  
> version of OpenSSH, but I haven't recompiled it yet with the patch  
> for tight integration. I don't think that's the problem, since  
> tight PVM integration works fine for my Linux systems.
>

did you checked the process tree with:

$ ps -e f

and all PVM generated processes on the slave nodes are kids of a  
sge_shepherd when using ssh?

Is the command `hostname`behaving different on IRIX compared to  
Linux? Are you getting there the FQDN, but SGE and the Lunux boxes  
were setup to work only with the short hostname?

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list