[GE users] PVM, SSh and vendor-specific host file

Bisbal, Prentice PBisbal at LexPharma.com
Tue Oct 3 17:23:10 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I modified my startpvm.sh script as recommened by Reuti (see below, and sorry for top-posting, but I'm forced to use outlook). I borrowed from the startmpi.sh script to create this function:

PeHostfile2OpenEyeHostFile()
{
   myname=`hostname`
   cat $1 | while read line; do
      # echo $line
      host=`echo $line|cut -f1 -d" "`
      nslots=`echo $line|cut -f2 -d" "`
      i=1
      if [ "$host" = "$myname" ]; then
          if [ $nslots -eq 1 ]; then
              continue
          elif [ $nslots -gt 1 ]; then
              nslots=`expr $nslots - 1`
          fi
      fi
      echo "host $host $nslots"
   done
}

This function is called later in the script like this (again mimicking startmpi.sh):

oe_hosts="$TMPDIR/oe_hosts"
PeHostfile2OpenEyeHostFile $pe_hostfile >> $oe_hosts

I've attached a patchfile containing these changes. This works exactly as desired when my PVM PE is configured for loose integration, as described in 
http://gridengine.sunsource.net/howto/pvm-integration/pvm-integration.html

However, when I switch my PVM PE configuration to tight integration,it works fine on my Linux execution hosts, but fails on my IRIX 6.5 hosts. I get the following errors in my .pe### file:

$ more tester_tight.sh.pe535
[pvmd pid257920] 10/03 11:15:29 usage: pvmd3 [-ddebugmask] [-nhostname] [hostfil
e]
[pvmd pid257920] 10/03 11:15:29 pvmbailout(0)
libpvm [pid1066678] /tmp/535.1.all.q/pvmd.2500: No such file or directory
libpvm [pid1066678] /tmp/535.1.all.q/pvmd.2500: No such file or directory
libpvm [pid1066678] /tmp/535.1.all.q/pvmd.2500: No such file or directory
libpvm [pid1066678]: pvm_mytid(): Can't contact local daemon

My job script looks like this:

#!/bin/sh
PVM_ROOT=/usr/local/share/pvm3
PVM_ARCH=`$PVM_ROOT/lib/pvmgetarch`

PVM_TMP=$TMPDIR
export PVM_TMP

$PVM_ROOT/bin/$PVM_ARCH/omega -in XXXXXXX.in -out XXXXXXX.out -pvm
conf $TMPDIR/oe_hosts

I did some investigating, and I noticed that the PVM temp dirs do not get created in $TMPDIR. Any idea why this works for Linux, but not IRIX? Again, loose integration works fine. Both architectures are using the same version of PVM compiled/configured the same way. I'm using SSH instead of RSH. Both architectures are using the same version of OpenSSH, but I haven't recompiled it yet with the patch for tight integration. I don't think that's the problem, since tight PVM integration works fine for my Linux systems.


Here's some addtional configuration information:

$ qconf -sp pvm
pe_name           pvm
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /usr/local/share/sge/pvm/startpvm.sh -catch_rsh $pe_hostfile \
                  $host /usr/local/share/pvm3
stop_proc_args    /usr/local/share/sge/pvm/stoppvm.sh -catch_rsh $pe_hostfile \
                  $host
allocation_rule   $fill_up
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

Configuration for one of my IRIX 6.5 hosts:

qconf -sconf irix1
irix1.lexpharma.com:
mailer                       /usr/sbin/Mail
xterm                        /usr/bin/X11/xterm
qlogin_daemon                /opt/bin/sshd -i
rlogin_daemon                /opt/bin/sshd -i
rsh_daemon                   /opt/bin/sshd -i
execd_spool_dir              /var/local/sge/default/spool
rsh_command                  /opt/bin/ssh -x
rlogin_command               /opt/bin/ssh
qlogin_command               /usr/local/share/sge/ssh/qlogin_wrapper.irix65.sh

My Global configuration:

$ qconf -sconf global
global:
execd_spool_dir              /var/local/sge/default/spool
mailer                       /bin/mail
xterm                        /usr/bin/X11/xterm
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 sh,ksh,csh,tcsh
min_uid                      0
min_gid                      0
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           root at lexpharma.com
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 none
reporting_params             accounting=true reporting=false \
                             flush_time=00:00:15 joblog=false sharelog=00:00:00
finished_jobs                100
gid_range                    20000-20100
qlogin_command               telnet
qlogin_daemon                /usr/sbin/in.telnetd
rlogin_daemon                /usr/sbin/in.rlogind
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 0


Prentice 



-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Thu 9/28/2006 4:57 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] PVM, SSh and vendor-specific host file
 
Hi,

Am 28.09.2006 um 22:00 schrieb Bisbal, Prentice:

> I'm using SGE 6.0u8 with ssh, and PVM 3.4.5 with loose integration.  
> I didn't recompile my SSH with the tight-integration patch, so I'm  
> using ssh "loosely integrated"
>
> PVM+SGE works fine with the sample applications that come with the  
> PVM distribution (helloh, timings, for example). I'm having trouble  
> getting a commercial PVM application to work. The application  
> expects to have the hostfile specified to it as a commandline  
> argument:
>
> foo -pvmconf pvmconfigfile
>
> To make matters worse, pvmconfigfile has a non-standard syntax of  
> the form:
>
> host <hostname> <# processors to use>
>
> So if I have hostA and hostB and I want to use 2 processors on  
> each, the file would look like this:
>
> host hostA 2
> host hostB 2
>
> I have 3 questions:
>
> 1) Is it possible to reference the pvm pe_hostfile from within my  
> job script to create a new file with the desired format. For  
> example, can I do something like
>
> for host in `cat $pe_hostfile`; do
>         echo "host $host 2" >> pvmconfigfile
> done
>
> foo -pvmconf pvmconfigfile
>
> in my submit script? If not, how else can I get this too work? If I  
> omit the -pvmconf switch, the application runs in single-processor  
> mode.
yes, you can do this to ensure the correct number of processes  
started on each node. But the pseudo variable $pe_hostfile is only  
known as an argument to start_proc_args. You can use either  
$PE_HOSTFILE in the jobscript (with uppercase), or prepare the custom  
pvmconfigfile in the startpvm.sh, which would be the best place IMO  
and hidden to the user. You could do it like it's done for the MPI  
integration, i.e. write the modified file to $TMPDIR/machines and use  
it in your jobscript:

foo -pvmconf $TMPDIR/machines
>
>
> 2) Will specifying the hosts in a config file like this break the  
> interactions between SGE and PVM?
No, in fact your application is behaving in a nice way, as it will  
supersede the PVM built-in round-robin for the task distribution,  
which would need an even number of slots on all involved nodes.  
(Unless your application is adding the given hosts on its own and  
trying to start there the pmvds.)
> 3) Will my use of SSH interfere with PVM in anyway, especially when  
> I switch to PVM with tight-integration?
>
Yes, If you feel the need to use ssh, then you have to recompile it  
with the ssh-patch and follow the PVM Howto for tight integration.

HTH - Reuti

(For loose integration please be sure to set the PVM_VMID to allow  
more than one pvmd per node, which belong to different jobs.)

> Thanks for your help.
> -- 
> Prentice
>
>
>
> The contents of this communication, including any attachments, may  
> be confidential, privileged or otherwise protected from disclosure.  
> They are intended solely for the use of the individual or entity to  
> whom they are addressed. If you are not the intended recipient,  
> please do not read, copy, use or disclose the contents of this  
> communication. Please notify the sender immediately and delete the  
> communication in its entirety.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net






The contents of this communication, including any attachments, may be confidential, privileged or otherwise protected from disclosure.  They are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the intended recipient, please do not read, copy, use or disclose the contents of this communication.  Please notify the sender immediately and delete the communication in its entirety.



    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list