[GE users] PVM, SSh and vendor-specific host file

Reuti reuti at staff.uni-marburg.de
Wed Oct 4 23:38:53 BST 2006


Am 03.10.2006 um 23:36 schrieb Bisbal, Prentice:

> Reuti,
>
> It seems that the values for TMPDIR and PVM_TMP aren't getting passed
> correctly. What could cause that?
>

Which version of sh and bash are installed on IRIX? Can you try to  
edit the first line of the rsh-wrapper to read:

#!/bin/bash

Under Linux the sh is most often a link to bash. Maybe not on IRIX. -  
Reuti


> I am using the same version of OpenSSH for both - version 3.9p1,  
> which I
> compiled/installed myself. The only difference is that the OpenSSH on
> the Linux systems was built from the Fedora Core 2 SRPM, so there were
> some patches included with that SRPM. I looked through the patches
> quickly, and don't think they should have an effect. I'm going though
> the sshd_config and ssh_config files on all the hosts right now, to  
> make
> sure they're all the same.
>
>
> -- 
> Prentice
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Tuesday, October 03, 2006 5:10 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] PVM, SSh and vendor-specific host file
>
> Am 03.10.2006 um 21:24 schrieb Bisbal, Prentice:
>
>> Again, I apologize for top-posting.
>>
>> `hostname` does return the FQDN for the hosts on all platforms, which
>> is correct. In my original version of the script, I was using `uname
>> -n`, which returns only the shortname on IRIX. That *did* cause
>> problems. All of my systems are setup to use the FQDN, both at the
>> operating system level and in SGE (confirmed with the output of  
>> 'qconf
>
>> -sel').
>>
>> IRIX doesn't have a ps command capable of duplicating the tree
>> structure of `ps -e f` on Linux, so here's just the output of 'ps -
>> ef' with two processes running (one master and one slave). It's  
>> harder
>
>> to look at, but if you look at PID and PPID columns, you can figure
>> out what's going on.
>>
>> This is the output of 'ps -ef' when using loose PVM integration on an
>> IRIX host:
>>
>> $ ps -ef | egrep "pbisbal|sge" | sort -k 2
>> sgeadmin     958543          1  0   Aug 15 ?      12:19 /usr/local/
>> share/sge/bin/irix65/sge_execd
>>  pbisbal    1054503    1080539  0 15:09:37 pts/0   0:00 ps -ef
>>  pbisbal    1076737          1  0 15:01:24 ?       0:00 /usr/local/
>> share/pvm3/lib/SGI64/pvmd3 /tmp/542.1.all.q/hostfile
>>  pbisbal    1080539    1080862  0 10:50:02 pts/0   0:01 -bash
>>  pbisbal    1080862    1080822  0 10:50:02 ?       0:01 /opt/sbin/
>> sshd -R
>> sgeadmin    1081200     958543  0 15:01:24 ?       0:00
>> sge_shepherd-542 -bg
>>  pbisbal    1081403    1081457  0 15:01:34 ?       7:57 /usr/local/
>> share/pvm3/bin/SGI64/omega -in XXXXXX.in -out XXXXXXX.out
>>  pbisbal    1081457    1081200  0 15:01:34 ?       0:00 /bin/sh /
>> var/local/sge/default/spool/hw-diesel/job_scripts/542
>>  pbisbal    1081549    1076737  0 15:01:34 ?       8:02 /usr/local/
>> share/pvm3/bin/SGI64/omega run_in_pvm_slave_mode
>
> Can you please check the available options to sshd on IRIX? It  
> might be,
> that they are different from the Linux ones (I remember an issue on
> Solaris, where -i wasn't available).
>
> In the worst case, the use of OpenSSH might help.
>
>> Here's how the same job looks on a Linux system with loose PVM
>> integration:
>>
>> $ ps -e f
>> 17576 ?        S     10:36 /usr/local/share/sge/bin/lx24-x86/ 
>> sge_execd
>> 20089 ?        S      0:00  \_ sge_shepherd-543 -bg
>> 20111 ?        S      0:00      \_ /bin/sh /var/local/sge/default/
>> spool/hw-appsrv05/job_scripts/543
>> 20116 ?        R      0:11          \_ /usr/local/share/pvm3/bin/
>> LINUXI386/omega -in XXXXXXXX.in -out XXXXX.out -pvmconf /tmp/543.1.a
>> 20108 ?        S      0:00 /usr/local/share/pvm3/lib/LINUXI386/
>> pvmd3 /tmp/543.1.all.q/hostfile
>> 20117 ?        R      0:14  \_ /usr/local/share/pvm3/bin/LINUXI386/
>> omega run_in_pvm_slave_mode
>>
>> And here's how it looks on a Linux system with tight PVM integration:
>> $ ps -e f
>> 17576 ?        S     10:37 /usr/local/share/sge/bin/lx24-x86/ 
>> sge_execd
>> 20158 ?        S      0:00  \_ sge_shepherd-544 -bg
>> 20200 ?        S      0:00  |   \_ /bin/sh /var/local/sge/default/
>> spool/hw-appsrv05/job_scripts/544
>> 20206 ?        R      0:09  |       \_ /usr/local/share/pvm3/bin/
>> LINUXI386/omega -in XXXXXX.in -out XXXXXX.out -pvmconf /tmp/544.1.a
>> 20181 ?        S      0:00  \_ sge_shepherd-544 -bg
>> 20182 ?        S      0:00      \_ sshd: pbisbal [priv]
>> 20185 ?        S      0:00          \_ sshd: pbisbal at notty
>> 20186 ?        S      0:00              \_ /usr/local/share/sge/
>> utilbin/lx24-x86/qrsh_starter /var/local/sge/default/spool/hw-
>> appsrv05/active_jobs/5
>> 20198 ?        S      0:00                  \_ /usr/local/share/
>> pvm3/lib/LINUXI386/pvmd3 /tmp/544.1.all.q/hostfile
>> 20207 ?        R      0:10                      \_ /usr/local/share/
>> pvm3/bin/LINUXI386/omega run_in_pvm_slave_mode
>> 20178 ?        S      0:00 /usr/local/share/sge/bin/lx24-x86/qrsh -
>> V -inherit hw-appsrv05.lexpharma.com env PVM_TMP=$TMPDIR /usr/local/
>> share/pvm3/li
>> 20183 ?        S      0:00  \_ /usr/bin/ssh -x -p 42731 hw-
>> appsrv05.lexpharma.com exec '/usr/local/share/sge/utilbin/lx24-x86/
>> qrsh_starter' '/var/lo
>
> Besides that the accounting will be wrong (missing additonal group ID
> for these processes - therefore the Tight SSH patch), this looks okay.
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>
>
> The contents of this communication, including any attachments, may  
> be confidential, privileged or otherwise protected from  
> disclosure.  They are intended solely for the use of the individual  
> or entity to whom they are addressed.  If you are not the intended  
> recipient, please do not read, copy, use or disclose the contents  
> of this communication.  Please notify the sender immediately and  
> delete the communication in its entirety.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list