[GE users] PVM, SSh and vendor-specific host file

Bisbal, Prentice PBisbal at LexPharma.com
Tue Oct 3 22:36:13 BST 2006


Reuti, 

It seems that the values for TMPDIR and PVM_TMP aren't getting passed
correctly. What could cause that? 

I am using the same version of OpenSSH for both - version 3.9p1, which I
compiled/installed myself. The only difference is that the OpenSSH on
the Linux systems was built from the Fedora Core 2 SRPM, so there were
some patches included with that SRPM. I looked through the patches
quickly, and don't think they should have an effect. I'm going though
the sshd_config and ssh_config files on all the hosts right now, to make
sure they're all the same. 


-- 
Prentice 

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Tuesday, October 03, 2006 5:10 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] PVM, SSh and vendor-specific host file

Am 03.10.2006 um 21:24 schrieb Bisbal, Prentice:

> Again, I apologize for top-posting.
>
> `hostname` does return the FQDN for the hosts on all platforms, which 
> is correct. In my original version of the script, I was using `uname 
> -n`, which returns only the shortname on IRIX. That *did* cause 
> problems. All of my systems are setup to use the FQDN, both at the 
> operating system level and in SGE (confirmed with the output of 'qconf

> -sel').
>
> IRIX doesn't have a ps command capable of duplicating the tree 
> structure of `ps -e f` on Linux, so here's just the output of 'ps - 
> ef' with two processes running (one master and one slave). It's harder

> to look at, but if you look at PID and PPID columns, you can figure 
> out what's going on.
>
> This is the output of 'ps -ef' when using loose PVM integration on an 
> IRIX host:
>
> $ ps -ef | egrep "pbisbal|sge" | sort -k 2
> sgeadmin     958543          1  0   Aug 15 ?      12:19 /usr/local/ 
> share/sge/bin/irix65/sge_execd
>  pbisbal    1054503    1080539  0 15:09:37 pts/0   0:00 ps -ef
>  pbisbal    1076737          1  0 15:01:24 ?       0:00 /usr/local/ 
> share/pvm3/lib/SGI64/pvmd3 /tmp/542.1.all.q/hostfile
>  pbisbal    1080539    1080862  0 10:50:02 pts/0   0:01 -bash
>  pbisbal    1080862    1080822  0 10:50:02 ?       0:01 /opt/sbin/ 
> sshd -R
> sgeadmin    1081200     958543  0 15:01:24 ?       0:00  
> sge_shepherd-542 -bg
>  pbisbal    1081403    1081457  0 15:01:34 ?       7:57 /usr/local/ 
> share/pvm3/bin/SGI64/omega -in XXXXXX.in -out XXXXXXX.out
>  pbisbal    1081457    1081200  0 15:01:34 ?       0:00 /bin/sh / 
> var/local/sge/default/spool/hw-diesel/job_scripts/542
>  pbisbal    1081549    1076737  0 15:01:34 ?       8:02 /usr/local/ 
> share/pvm3/bin/SGI64/omega run_in_pvm_slave_mode

Can you please check the available options to sshd on IRIX? It might be,
that they are different from the Linux ones (I remember an issue on
Solaris, where -i wasn't available).

In the worst case, the use of OpenSSH might help.

> Here's how the same job looks on a Linux system with loose PVM
> integration:
>
> $ ps -e f
> 17576 ?        S     10:36 /usr/local/share/sge/bin/lx24-x86/sge_execd
> 20089 ?        S      0:00  \_ sge_shepherd-543 -bg
> 20111 ?        S      0:00      \_ /bin/sh /var/local/sge/default/ 
> spool/hw-appsrv05/job_scripts/543
> 20116 ?        R      0:11          \_ /usr/local/share/pvm3/bin/ 
> LINUXI386/omega -in XXXXXXXX.in -out XXXXX.out -pvmconf /tmp/543.1.a
> 20108 ?        S      0:00 /usr/local/share/pvm3/lib/LINUXI386/ 
> pvmd3 /tmp/543.1.all.q/hostfile
> 20117 ?        R      0:14  \_ /usr/local/share/pvm3/bin/LINUXI386/ 
> omega run_in_pvm_slave_mode
>
> And here's how it looks on a Linux system with tight PVM integration:
> $ ps -e f
> 17576 ?        S     10:37 /usr/local/share/sge/bin/lx24-x86/sge_execd
> 20158 ?        S      0:00  \_ sge_shepherd-544 -bg
> 20200 ?        S      0:00  |   \_ /bin/sh /var/local/sge/default/ 
> spool/hw-appsrv05/job_scripts/544
> 20206 ?        R      0:09  |       \_ /usr/local/share/pvm3/bin/ 
> LINUXI386/omega -in XXXXXX.in -out XXXXXX.out -pvmconf /tmp/544.1.a
> 20181 ?        S      0:00  \_ sge_shepherd-544 -bg
> 20182 ?        S      0:00      \_ sshd: pbisbal [priv]
> 20185 ?        S      0:00          \_ sshd: pbisbal at notty
> 20186 ?        S      0:00              \_ /usr/local/share/sge/ 
> utilbin/lx24-x86/qrsh_starter /var/local/sge/default/spool/hw-
> appsrv05/active_jobs/5
> 20198 ?        S      0:00                  \_ /usr/local/share/ 
> pvm3/lib/LINUXI386/pvmd3 /tmp/544.1.all.q/hostfile
> 20207 ?        R      0:10                      \_ /usr/local/share/ 
> pvm3/bin/LINUXI386/omega run_in_pvm_slave_mode
> 20178 ?        S      0:00 /usr/local/share/sge/bin/lx24-x86/qrsh - 
> V -inherit hw-appsrv05.lexpharma.com env PVM_TMP=$TMPDIR /usr/local/ 
> share/pvm3/li
> 20183 ?        S      0:00  \_ /usr/bin/ssh -x -p 42731 hw- 
> appsrv05.lexpharma.com exec '/usr/local/share/sge/utilbin/lx24-x86/
> qrsh_starter' '/var/lo

Besides that the accounting will be wrong (missing additonal group ID
for these processes - therefore the Tight SSH patch), this looks okay.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net





The contents of this communication, including any attachments, may be confidential, privileged or otherwise protected from disclosure.  They are intended solely for the use of the individual or entity to whom they are addressed.  If you are not the intended recipient, please do not read, copy, use or disclose the contents of this communication.  Please notify the sender immediately and delete the communication in its entirety.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list