[GE users] Questions on SSH tight integration + Rocks OS 4.3

VS Ang vs_ang at yahoo.com
Fri Nov 30 03:38:33 GMT 2007


Ron, thank you for your original response. I have been experimenting with this the last few days. However, I haven't had much success.

First, I tried building the code with the tight-integration procedure that was outlined in the presentation you sent and also the Tokyo institute paper. I created the patched OpenSSH binaries, and also specified these for the qlogin, etc. commands in the global configuration. (I installed the patched OpenSSH in /usr/local/openssh-sge).

qlogin_command               /opt/gridengine/bin/rocks-qlogin.sh
qlogin_daemon                /usr/local/openssh-sge/sbin/sshd -i
rlogin_daemon                /usr/local/openssh-sge/sbin/sshd -i
qrsh_command                 /usr/local/openssh-sge/bin/ssh
rsh_command                  /usr/local/openssh-sge/bin/ssh -t -X
rlogin_command               /usr/local/openssh-sge/bin/ssh
rsh_daemon                   /usr/local/openssh-sge/sbin/sshd -i
qrsh_daemon                  /usr/local/openssh-sge/sbin/sshd

In addition, I also specified the "enable_addgrp_kill" flag with the gid_range parameter on the qmaster:

enable_addgrp_kill           true
gid_range                    20000-21000

Now, when I launch MPICH-MX jobs on the cluster, I observed the process tree. On the first node where the mpirun command was started:

root     22638  0.2  0.0 58448 1968 ?        S    22:12   0:01/opt/gridengine/bin/lx26-amd64/sge_execd
root     26102  0.0  0.0  8508  944 ?        S    22:22   0:00  \_ sge_shepherd-51 -bg
srihari  26142  0.0  0.0 53836 1144 ?        Ss   22:22   0:00      \_ /bin/bash /opt/gridengine/default/spool/compute-1-1/jo
srihari  26144  0.0  0.0 69404 4572 ?        S    22:22   0:00          \_ perl -S -w /home/ibm/util/mpich-mx-1.2.7..1/icc/9.
srihari  26146  0.0  0.0 69404 3800 ?        S    22:22   0:00              \_ perl -S -w /home/ibm/util/mpich-mx-1.2.7..1/ic
srihari  26147  0.2  0.0 18652 2188 ?        S    22:22   0:00              \_ ssh compute-1-1 cd /home2/srihari && exec env
srihari  26148  0.0  0.0 18652 2188 ?        S    22:22   0:00              \_ ssh compute-1-1 -n cd /home2/srihari && exec e
srihari  26149  0.0  0.0 18652 2188 ?        S    22:22   0:00              \_ ssh compute-1-1 -n cd /home2/srihari && exec e
srihari  26150  0.0  0.0 18652 2188 ?        S    22:22   0:00              \_ ssh compute-1-1 -n cd /home2/srihari && exec e
srihari  26151  0.0  0.0 19820 2248 ?        S    22:22   0:00              \_ ssh compute-1-5 -n cd /home2/srihari && exec e
srihari  26152  0.0  0.0 19820 2248 ?        S    22:22   0:00              \_ ssh compute-1-5 -n cd /home2/srihari && exec e
srihari  26153  0.0  0.0 19820 2248 ?        S    22:22   0:00              \_ ssh compute-1-5 -n cd /home2/srihari && exec e
srihari  26159  0.0  0.0 19820 2248 ?        S    22:22   0:00              \_ ssh compute-1-5 -n cd /home2/srihari && exec e
srihari  26161  0.0  0.0 19820 2248 ?        S    22:22   0:00              \_ ssh compute-1-4 -n cd /home2/srihari && exec e
srihari  26164  0.0  0.0 19820 2248 ?        S    22:22   0:00              \_ ssh compute-1-4 -n cd /home2/srihari && exec e
srihari  26165  0.0  0.0 19820 2248 ?        S    22:22   0:00              \_ ssh compute-1-4 -n cd /home2/srihari && exec e
srihari  26166  0.0  0.0 19820 2248 ?        S    22:22   0:00              \_ ssh compute-1-4 -n cd /home2/srihari && exec e

On the other nodes, the process tree looks like this:

root      3525  0.0  0.0 21928 1268 ?        Ss   Nov15   0:02 /usr/sbin/sshd
root     31074  0.0  0.0 37092 2540 ?        Ss   22:30   0:00  \_ sshd: srihari [priv]
srihari  31082  0.0  0.0 37224 1812 ?        S    22:30   0:00  |   \_ sshd: srihari at notty
srihari  31310  100  0.1 26208 12216 ?       Rsl  22:30   0:29  |       \_ /home2/srihari/IMB_3.0/src/IMB-MPI1
root     31076  0.0  0.0 37092 2540 ?        Ss   22:30   0:00  \_ sshd: srihari [priv]
srihari  31083  0.0  0.0 37224 1812 ?        S    22:30   0:00  |   \_ sshd: srihari at notty
srihari  31318 99.8  0.1 26208 12216 ?       Rsl  22:30   0:28  |       \_ /home2/srihari/IMB_3.0/src/IMB-MPI1
root     31077  0.0  0.0 37092 2540 ?        Ss   22:30   0:00  \_ sshd: srihari [priv]
srihari  31084  0.0  0.0 37224 1812 ?        S    22:30   0:00  |   \_ sshd: srihari at notty
srihari  31328 99.3  0.1 27232 13252 ?       Rsl  22:30   0:28  |       \_ /home2/srihari/IMB_3.0/src/IMB-MPI1
root     31079  0.0  0.0 37092 2540 ?        Ss   22:30   0:00  \_ sshd: srihari [priv]
srihari  31085  0.0  0.0 37224 1812 ?        S    22:30   0:00  |   \_ sshd: srihari at notty
srihari  31331 99.3  0.1 27232 13252 ?       Rsl  22:30   0:28  |       \_ /home2/srihari/IMB_3.0/src/IMB-MPI1

Now, when I do "qdel" on this job, it doesn't kill all the MPI processes in the tree as I was hoping for. So, I must be still missing something here..

Srihari

----- Original Message ----
From: Ron Chen <ron_chen_123 at yahoo.com>
To: users at gridengine.sunsource.net
Sent: Monday, November 19, 2007 1:27:54 PM
Subject: Re: [GE users] Questions on SSH tight integration + Rocks OS 4.3


--- VS Ang <vs_ang at yahoo.com> wrote:
> First, it's not clear to me what supplementary group IDs are.
> Also, the release notes refer to the gid_range parameter for 
> the execd "local" configuration on each node. Are these group
> ID ranges supposed to be non-overlapping on each node? 

The "gid_range" is documented in sge_conf(5). As long as the
range is bigger than the max. number of jobs per node, you don't
need to change it.

Also, the "gid_range" of a node can be overlapped by other
nodes, as the group ID space is not shared in the nodes.


> Also, when I try to edit the execd configuration on the node
> using qconf, I am getting the following errors. Does it mean
> the gid_range parameter is not supported in this version (even
> though this is 6.0u8)?

"gid_range" was supported even before SGE 5.x.


> 2) >./aimk -gcc -no-java -no-jni -no-qtcsh -spool-classic
> -tight-ssh
> 
> I would appreciate it if someone can point me to the "right"
> documentation on implementing the SSH tight integration and
> tell me the requirements for building the code with tight
> integration support. 

You can take a look at the SGE workshop presentation:

"SGE-openSSH Tight Integration":

http://gridengine.sunsource.net/download/workshop10-12_09_07/SGE-WS2007-openSSHTightIntegration_RonChen.pdf

 -Ron



> 
> Thank you,
> Srihari
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
users-help at gridengine.sunsource.net



    
  ____________________________________________________________________________________
Be a better pen pal. 
Text or chat with friends inside Yahoo! Mail. See how.
  http://overview.mail.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net







More information about the gridengine-users mailing list