[GE users] Questions on SSH tight integration + Rocks OS 4.3

Reuti reuti at staff.uni-marburg.de
Fri Nov 30 13:48:11 GMT 2007


Hi,

Am 30.11.2007 um 04:38 schrieb VS Ang:

> Ron, thank you for your original response. I have been  
> experimenting with this the last few days. However, I haven't had  
> much success.
>
> First, I tried building the code with the tight-integration  
> procedure that was outlined in the presentation you sent and also  
> the Tokyo institute paper. I created the patched OpenSSH binaries,  
> and also specified these for the qlogin, etc. commands in the  
> global configuration. (I installed the patched OpenSSH in /usr/ 
> local/openssh-sge).
>
> qlogin_command               /opt/gridengine/bin/rocks-qlogin.sh
> qlogin_daemon                /usr/local/openssh-sge/sbin/sshd -i
> rlogin_daemon                /usr/local/openssh-sge/sbin/sshd -i
> qrsh_command                 /usr/local/openssh-sge/bin/ssh
> rsh_command                  /usr/local/openssh-sge/bin/ssh -t -X
> rlogin_command               /usr/local/openssh-sge/bin/ssh
> rsh_daemon                   /usr/local/openssh-sge/sbin/sshd -i
> qrsh_daemon                  /usr/local/openssh-sge/sbin/sshd

qrsh_command
qrsh_daemon

shouldn't be necessary to set.

> In addition, I also specified the "enable_addgrp_kill" flag with  
> the gid_range parameter on the qmaster:
>
> enable_addgrp_kill           true
> gid_range                    20000-21000
>
> Now, when I launch MPICH-MX jobs on the cluster, I observed the  
> process tree. On the first node where the mpirun command was started:
>
> root     22638  0.2  0.0 58448 1968 ?        S    22:12   0:01/opt/ 
> gridengine/bin/lx26-amd64/sge_execd
> root     26102  0.0  0.0  8508  944 ?        S    22:22   0:00  \_  
> sge_shepherd-51 -bg
> srihari  26142  0.0  0.0 53836 1144 ?        Ss   22:22   0:00       
> \_ /bin/bash /opt/gridengine/default/spool/compute-1-1/jo
> srihari  26144  0.0  0.0 69404 4572 ?        S    22:22    
> 0:00          \_ perl -S -w /home/ibm/util/mpich-mx-1.2.7..1/icc/9.
> srihari  26146  0.0  0.0 69404 3800 ?        S    22:22    
> 0:00              \_ perl -S -w /home/ibm/util/mpich-mx-1.2.7..1/ic

Seems the mpich job is just calling the conventional ssh, not the one  
you supplied. For this the rsh-wrapper must be used, which is  
symbolical linked to in start_proc_args (of the PE definition) in the  
$TMPDIR of the job.

You can try to set:

export P4_RSHCOMMAND=rsh
export MPICH_PROCESS_GROUP=no

in your jobscript. Hence the MPICH will be tweaked to use rsh instead  
of the compiled-in ssh, this will call the rsh-wrapper, which in turn  
will issue a "qrsh -inherit", which will then use the defined command  
for qrsh_command, i.e. the new ssh.

http://gridengine.sunsource.net/howto/mpich-integration.html (BTW:  
for actual Myrinet versions you don't need to the change the Perl  
scipt any longer; but other hints might still be useful).

-- Reuti


> On the other nodes, the process tree looks like this:
>
> root      3525  0.0  0.0 21928 1268 ?        Ss   Nov15   0:02 /usr/ 
> sbin/sshd
> root     31074  0.0  0.0 37092 2540 ?        Ss   22:30   0:00  \_  
> sshd: srihari [priv]
> srihari  31082  0.0  0.0 37224 1812 ?        S    22:30   0:00  |    
> \_ sshd: srihari at notty
> srihari  31310  100  0.1 26208 12216 ?       Rsl  22:30   0:29   
> |       \_ /home2/srihari/IMB_3.0/src/IMB-MPI1
> root     31076  0.0  0.0 37092 2540 ?        Ss   22:30   0:00  \_  
> sshd: srihari [priv]
> srihari  31083  0.0  0.0 37224 1812 ?        S    22:30   0:00  |    
> \_ sshd: srihari at notty
> srihari  31318 99.8  0.1 26208 12216 ?       Rsl  22:30   0:28   
> |       \_ /home2/srihari/IMB_3.0/src/IMB-MPI1
> root     31077  0.0  0.0 37092 2540 ?        Ss   22:30   0:00  \_  
> sshd: srihari [priv]
> srihari  31084  0.0  0.0 37224 1812 ?        S    22:30   0:00  |    
> \_ sshd: srihari at notty
> srihari  31328 99.3  0.1 27232 13252 ?       Rsl  22:30   0:28   
> |       \_ /home2/srihari/IMB_3.0/src/IMB-MPI1
> root     31079  0.0  0.0 37092 2540 ?        Ss   22:30   0:00  \_  
> sshd: srihari [priv]
> srihari  31085  0.0  0.0 37224 1812 ?        S    22:30   0:00  |    
> \_ sshd: srihari at notty
> srihari  31331 99.3  0.1 27232 13252 ?       Rsl  22:30   0:28   
> |       \_ /home2/srihari/IMB_3.0/src/IMB-MPI1
>
> Now, when I do "qdel" on this job, it doesn't kill all the MPI  
> processes in the tree as I was hoping for. So, I must be still  
> missing something here..
>
> Srihari
>
> ----- Original Message ----
> From: Ron Chen <ron_chen_123 at yahoo.com>
> To: users at gridengine.sunsource.net
> Sent: Monday, November 19, 2007 1:27:54 PM
> Subject: Re: [GE users] Questions on SSH tight integration + Rocks  
> OS 4.3
>
> --- VS Ang <vs_ang at yahoo.com> wrote:
> > First, it's not clear to me what supplementary group IDs are.
> > Also, the release notes refer to the gid_range parameter for
> > the execd "local" configuration on each node. Are these group
> > ID ranges supposed to be non-overlapping on each node?
>
> The "gid_range" is documented in sge_conf(5). As long as the
> range is bigger than the max. number of jobs per node, you don't
> need to change it.
>
> Also, the "gid_range" of a node can be overlapped by other
> nodes, as the group ID space is not shared in the nodes.
>
>
> > Also, when I try to edit the execd configuration on the node
> > using qconf, I am getting the following errors. Does it mean
> > the gid_range parameter is not supported in this version (even
> > though this is 6.0u8)?
>
> "gid_range" was supported even before SGE 5.x.
>
>
> > 2) >./aimk -gcc -no-java -no-jni -no-qtcsh -spool-classic
> > -tight-ssh
> >
> > I would appreciate it if someone can point me to the "right"
> > documentation on implementing the SSH tight integration and
> > tell me the requirements for building the code with tight
> > integration support.
>
> You can take a look at the SGE workshop presentation:
>
> "SGE-openSSH Tight Integration":
>
> http://gridengine.sunsource.net/download/workshop10-12_09_07/SGE- 
> WS2007-openSSHTightIntegration_RonChen.pdf
>
> -Ron
>
>
>
> >
> > Thank you,
> > Srihari
> > >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
>
>
>
>        
> ______________________________________________________________________ 
> ______________
> Be a better pen pal.
> Text or chat with friends inside Yahoo! Mail. See how.   http:// 
> overview.mail.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list