[GE users] PVM, SSh and vendor-specific host file

Bisbal, Prentice PBisbal at LexPharma.com
Thu Oct 5 21:31:56 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Which rsh-wrapper are you referring to? Do you mean $SGE_ROOT/pvm/rsh? 
If so, I made the change you requested. I looked through that script, and didn't see any bash-specific code. It all looked like standard Bourne-shell code. 

I think I found a clue. Look at the SSH error on the first line below from the .pe### file of failed job:

head tester_tight.sh.pe579 
ssh_exchange_identification: Connection closed by remote host
libpvm [pid1195109] /tmp/579.1.all.q/pvmd.2500: No such file or directory
libpvm [pid1195109] /tmp/579.1.all.q/pvmd.2500: No such file or directory
libpvm [pid1195109] /tmp/579.1.all.q/pvmd.2500: No such file or directory
libpvm [pid1195109]: pvm_mytid(): Can't contact local daemon

I googled on the SSH error, and recompiled SSH w/o TCP wrappers, which was the only advice I could find for this error. When SGE encounters this error, is it running under my username, or the sgeadmin user (the username sge_execd is running as). I've turned the loglevel up all the way on the IRIX execution hosts, but haven't found any usefull error messages there. 

Prentice 


-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Wed 10/4/2006 6:38 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] PVM, SSh and vendor-specific host file
 
Am 03.10.2006 um 23:36 schrieb Bisbal, Prentice:

> Reuti,
>
> It seems that the values for TMPDIR and PVM_TMP aren't getting passed
> correctly. What could cause that?
>

Which version of sh and bash are installed on IRIX? Can you try to  
edit the first line of the rsh-wrapper to read:

#!/bin/bash

Under Linux the sh is most often a link to bash. Maybe not on IRIX. -  
Reuti


> I am using the same version of OpenSSH for both - version 3.9p1,  
> which I
> compiled/installed myself. The only difference is that the OpenSSH on
> the Linux systems was built from the Fedora Core 2 SRPM, so there were
> some patches included with that SRPM. I looked through the patches
> quickly, and don't think they should have an effect. I'm going though
> the sshd_config and ssh_config files on all the hosts right now, to  
> make
> sure they're all the same.
>
>
> -- 
> Prentice
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Tuesday, October 03, 2006 5:10 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] PVM, SSh and vendor-specific host file
>
> Am 03.10.2006 um 21:24 schrieb Bisbal, Prentice:
>
>> Again, I apologize for top-posting.
>>
>> `hostname` does return the FQDN for the hosts on all platforms, which
>> is correct. In my original version of the script, I was using `uname
>> -n`, which returns only the shortname on IRIX. That *did* cause
>> problems. All of my systems are setup to use the FQDN, both at the
>> operating system level and in SGE (confirmed with the output of  
>> 'qconf
>
>> -sel').
>>
>> IRIX doesn't have a ps command capable of duplicating the tree
>> structure of `ps -e f` on Linux, so here's just the output of 'ps -
>> ef' with two processes running (one master and one slave). It's  
>> harder
>
>> to look at, but if you look at PID and PPID columns, you can figure
>> out what's going on.
>>
>> This is the output of 'ps -ef' when using loose PVM integration on an
>> IRIX host:
>>
>> $ ps -ef | egrep "pbisbal|sge" | sort -k 2
>> sgeadmin     958543          1  0   Aug 15 ?      12:19 /usr/local/
>> share/sge/bin/irix65/sge_execd
>>  pbisbal    1054503    1080539  0 15:09:37 pts/0   0:00 ps -ef
>>  pbisbal    1076737          1  0 15:01:24 ?       0:00 /usr/local/
>> share/pvm3/lib/SGI64/pvmd3 /tmp/542.1.all.q/hostfile
>>  pbisbal    1080539    1080862  0 10:50:02 pts/0   0:01 -bash
>>  pbisbal    1080862    1080822  0 10:50:02 ?       0:01 /opt/sbin/
>> sshd -R
>> sgeadmin    1081200     958543  0 15:01:24 ?       0:00
>> sge_shepherd-542 -bg
>>  pbisbal    1081403    1081457  0 15:01:34 ?       7:57 /usr/local/
>> share/pvm3/bin/SGI64/omega -in XXXXXX.in -out XXXXXXX.out
>>  pbisbal    1081457    1081200  0 15:01:34 ?       0:00 /bin/sh /
>> var/local/sge/default/spool/hw-diesel/job_scripts/542
>>  pbisbal    1081549    1076737  0 15:01:34 ?       8:02 /usr/local/
>> share/pvm3/bin/SGI64/omega run_in_pvm_slave_mode
>
> Can you please check the available options to sshd on IRIX? It  
> might be,
> that they are different from the Linux ones (I remember an issue on
> Solaris, where -i wasn't available).
>
> In the worst case, the use of OpenSSH might help.
>
>> Here's how the same job looks on a Linux system with loose PVM
>> integration:
>>
>> $ ps -e f
>> 17576 ?        S     10:36 /usr/local/share/sge/bin/lx24-x86/ 
>> sge_execd
>> 20089 ?        S      0:00  \_ sge_shepherd-543 -bg
>> 20111 ?        S      0:00      \_ /bin/sh /var/local/sge/default/
>> spool/hw-appsrv05/job_scripts/543
>> 20116 ?        R      0:11          \_ /usr/local/share/pvm3/bin/
>> LINUXI386/omega -in XXXXXXXX.in -out XXXXX.out -pvmconf /tmp/543.1.a
>> 20108 ?        S      0:00 /usr/local/share/pvm3/lib/LINUXI386/
>> pvmd3 /tmp/543.1.all.q/hostfile
>> 20117 ?        R      0:14  \_ /usr/local/share/pvm3/bin/LINUXI386/
>> omega run_in_pvm_slave_mode
>>
>> And here's how it looks on a Linux system with tight PVM integration:
>> $ ps -e f
>> 17576 ?        S     10:37 /usr/local/share/sge/bin/lx24-x86/ 
>> sge_execd
>> 20158 ?        S      0:00  \_ sge_shepherd-544 -bg
>> 20200 ?        S      0:00  |   \_ /bin/sh /var/local/sge/default/
>> spool/hw-appsrv05/job_scripts/544
>> 20206 ?        R      0:09  |       \_ /usr/local/share/pvm3/bin/
>> LINUXI386/omega -in XXXXXX.in -out XXXXXX.out -pvmconf /tmp/544.1.a
>> 20181 ?        S      0:00  \_ sge_shepherd-544 -bg
>> 20182 ?        S      0:00      \_ sshd: pbisbal [priv]
>> 20185 ?        S      0:00          \_ sshd: pbisbal at notty
>> 20186 ?        S      0:00              \_ /usr/local/share/sge/
>> utilbin/lx24-x86/qrsh_starter /var/local/sge/default/spool/hw-
>> appsrv05/active_jobs/5
>> 20198 ?        S      0:00                  \_ /usr/local/share/
>> pvm3/lib/LINUXI386/pvmd3 /tmp/544.1.all.q/hostfile
>> 20207 ?        R      0:10                      \_ /usr/local/share/
>> pvm3/bin/LINUXI386/omega run_in_pvm_slave_mode
>> 20178 ?        S      0:00 /usr/local/share/sge/bin/lx24-x86/qrsh -
>> V -inherit hw-appsrv05.lexpharma.com env PVM_TMP=$TMPDIR /usr/local/
>> share/pvm3/li
>> 20183 ?        S      0:00  \_ /usr/bin/ssh -x -p 42731 hw-
>> appsrv05.lexpharma.com exec '/usr/local/share/sge/utilbin/lx24-x86/
>> qrsh_starter' '/var/lo
>
> Besides that the accounting will be wrong (missing additonal group ID
> for these processes - therefore the Tight SSH patch), this looks okay.
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>
>
> The contents of this communication, including any attachments, may  
> be confidential, privileged or otherwise protected from  
> disclosure.  They are intended solely for the use of the individual  
> or entity to whom they are addressed.  If you are not the intended  
> recipient, please do not read, copy, use or disclose the contents  
> of this communication.  Please notify the sender immediately and  
> delete the communication in its entirety.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net






The contents of this communication, including any attachments, may be confidential, privileged or otherwise protected from disclosure.  They are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the intended recipient, please do not read, copy, use or disclose the contents of this communication.  Please notify the sender immediately and delete the communication in its entirety.



    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list