[GE users] PVM, SSh and vendor-specific host file

Reuti reuti at staff.uni-marburg.de
Tue Oct 3 22:10:17 BST 2006


Am 03.10.2006 um 21:24 schrieb Bisbal, Prentice:

> Again, I apologize for top-posting.
>
> `hostname` does return the FQDN for the hosts on all platforms,  
> which is correct. In my original version of the script, I was using  
> `uname -n`, which returns only the shortname on IRIX. That *did*  
> cause problems. All of my systems are setup to use the FQDN, both  
> at the operating system level and in SGE (confirmed with the output  
> of 'qconf -sel').
>
> IRIX doesn't have a ps command capable of duplicating the tree  
> structure of `ps -e f` on Linux, so here's just the output of 'ps - 
> ef' with two processes running (one master and one slave). It's  
> harder to look at, but if you look at PID and PPID columns, you can  
> figure out what's going on.
>
> This is the output of 'ps -ef' when using loose PVM integration on  
> an IRIX host:
>
> $ ps -ef | egrep "pbisbal|sge" | sort -k 2
> sgeadmin     958543          1  0   Aug 15 ?      12:19 /usr/local/ 
> share/sge/bin/irix65/sge_execd
>  pbisbal    1054503    1080539  0 15:09:37 pts/0   0:00 ps -ef
>  pbisbal    1076737          1  0 15:01:24 ?       0:00 /usr/local/ 
> share/pvm3/lib/SGI64/pvmd3 /tmp/542.1.all.q/hostfile
>  pbisbal    1080539    1080862  0 10:50:02 pts/0   0:01 -bash
>  pbisbal    1080862    1080822  0 10:50:02 ?       0:01 /opt/sbin/ 
> sshd -R
> sgeadmin    1081200     958543  0 15:01:24 ?       0:00  
> sge_shepherd-542 -bg
>  pbisbal    1081403    1081457  0 15:01:34 ?       7:57 /usr/local/ 
> share/pvm3/bin/SGI64/omega -in XXXXXX.in -out XXXXXXX.out
>  pbisbal    1081457    1081200  0 15:01:34 ?       0:00 /bin/sh / 
> var/local/sge/default/spool/hw-diesel/job_scripts/542
>  pbisbal    1081549    1076737  0 15:01:34 ?       8:02 /usr/local/ 
> share/pvm3/bin/SGI64/omega run_in_pvm_slave_mode

Can you please check the available options to sshd on IRIX? It might  
be, that they are different from the Linux ones (I remember an issue  
on Solaris, where -i wasn't available).

In the worst case, the use of OpenSSH might help.

> Here's how the same job looks on a Linux system with loose PVM  
> integration:
>
> $ ps -e f
> 17576 ?        S     10:36 /usr/local/share/sge/bin/lx24-x86/sge_execd
> 20089 ?        S      0:00  \_ sge_shepherd-543 -bg
> 20111 ?        S      0:00      \_ /bin/sh /var/local/sge/default/ 
> spool/hw-appsrv05/job_scripts/543
> 20116 ?        R      0:11          \_ /usr/local/share/pvm3/bin/ 
> LINUXI386/omega -in XXXXXXXX.in -out XXXXX.out -pvmconf /tmp/543.1.a
> 20108 ?        S      0:00 /usr/local/share/pvm3/lib/LINUXI386/ 
> pvmd3 /tmp/543.1.all.q/hostfile
> 20117 ?        R      0:14  \_ /usr/local/share/pvm3/bin/LINUXI386/ 
> omega run_in_pvm_slave_mode
>
> And here's how it looks on a Linux system with tight PVM integration:
> $ ps -e f
> 17576 ?        S     10:37 /usr/local/share/sge/bin/lx24-x86/sge_execd
> 20158 ?        S      0:00  \_ sge_shepherd-544 -bg
> 20200 ?        S      0:00  |   \_ /bin/sh /var/local/sge/default/ 
> spool/hw-appsrv05/job_scripts/544
> 20206 ?        R      0:09  |       \_ /usr/local/share/pvm3/bin/ 
> LINUXI386/omega -in XXXXXX.in -out XXXXXX.out -pvmconf /tmp/544.1.a
> 20181 ?        S      0:00  \_ sge_shepherd-544 -bg
> 20182 ?        S      0:00      \_ sshd: pbisbal [priv]
> 20185 ?        S      0:00          \_ sshd: pbisbal at notty
> 20186 ?        S      0:00              \_ /usr/local/share/sge/ 
> utilbin/lx24-x86/qrsh_starter /var/local/sge/default/spool/hw- 
> appsrv05/active_jobs/5
> 20198 ?        S      0:00                  \_ /usr/local/share/ 
> pvm3/lib/LINUXI386/pvmd3 /tmp/544.1.all.q/hostfile
> 20207 ?        R      0:10                      \_ /usr/local/share/ 
> pvm3/bin/LINUXI386/omega run_in_pvm_slave_mode
> 20178 ?        S      0:00 /usr/local/share/sge/bin/lx24-x86/qrsh - 
> V -inherit hw-appsrv05.lexpharma.com env PVM_TMP=$TMPDIR /usr/local/ 
> share/pvm3/li
> 20183 ?        S      0:00  \_ /usr/bin/ssh -x -p 42731 hw- 
> appsrv05.lexpharma.com exec '/usr/local/share/sge/utilbin/lx24-x86/ 
> qrsh_starter' '/var/lo

Besides that the accounting will be wrong (missing additonal group ID  
for these processes - therefore the Tight SSH patch), this looks okay.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list