[GE users] Anybody able to get SGE MPICH2 Tight Integration via SSH working?

Reuti reuti at staff.uni-marburg.de
Wed Jan 4 19:17:49 GMT 2006

Hi Jonathan,

first some remarks about the security. The easiest way would be to  
install two network cards in the headnode of the cluster, so that the  
users can connect to it, and use the second one to connect to all the  
nodes from the master. So they are not connected to the outside  
world, and the only break-in attempts could be made from the head  
node, on which the granted users have access to. I see the situation,  
that by using some desktop PCs to work on during the night, this  
might not be possible to set up, unless also the desktop PCs would  
have two network cards.

But in both cases, you don't need a running rshd or sshd on the nodes  
at all, as SGE is using it's on rshd to achieve the Tight  
Integration. This special rshd will only be started to allow access  
on a chosen port. If you still see the need for using ssh, you can  
also do this as you already found in the appropriate Howto. I'm not  
sure, what you mean with "passphraseless keys". Usually you generate  
the keypair with ssh-keygen for each user and then put the public key  
into their .ssh/authorized_keys (and maybe adjust the .ssh/ 
known_hosts file).

If you go for ssh, then it shouldn't be necessary to change any of  
the start-scripts for the parallel libs support at all. Just edit the  
entries in the SGE config as mentioned in the ssh Howto. The idea  
behind the Tight Integration in SGE is:

- the start-script for the PE will create a link called "rsh" in the  
$TMPDIR to the rsh-wrapper

- this link will be found first by a call to "rsh" of the  
application, as $TMPDIR is the first in the generated $PATH

- if the application has a compiled-in "ssh", you could change RSHCMD  
to be "ssh", so that also by a ssh the rsh-wrapper will be found (the  
link will simply be named ssh, although it still points to the rsh- 
wrapper - just a convenience)

- the rsh-wrapper will call SGE's qrsh, which will in the end start a  
private rshd (or if configured sshd) on the slave nodes for this job  
on a dedicated port and start the local rsh/ssh with a command

HTH - Reuti

Am 04.01.2006 um 18:27 schrieb Jonathan Schreiter:

> Hello all,
> I'm new to SGE, and trying to enable tight integration
> with mpich2 and ssh (SGE 6.07, mpich2 1.0.3, FC4 linux
> 2.6 kernel with latest ssh).  I found the two howto's
> on the project site re ssh integegration and
> integration with mpich2 via rsh.  The rsh security
> concerns are the primary reason I'd like to use ssh -
> (specifically ssh based passphraseless keys on a per
> user basis which is a bit better) - also because of
> the requirement to disable the firewall for mpich2 to
> work with dynamic port assignments (even if one
> specifies the primary listen to port).
> If I use the original scripts included in
> $SGE_ROOT/mpi and have smpd started on the execution
> hosts, I am able to sucessfully submit and execute mpi
> jobs via a PE mpich2 environment on SGE.  I can also
> start the smpd process on the exe hosts via a submit
> job using ssh.  However, I do not know how one could
> ever implement SGE w/o tight integration this way with
> failed scripts / memory leaks / limbo processes, etc.
> So I've been following the section "Tight Integration
> of the daemon-based smpd startup method" closely.
> Looking at the start_mpich2.c file, there doesn't
> appear to be any rsh specific methods that need
> changing (just a fork()).  In startmpi2.sh the area
> where I think needs modification is:
> rshcmd=rsh
> to something like rshcmd="ssh -i <~/.ssh/user's
> passphraseless key>"
> I have the $SGE_ROOT/mpich2_smpd and home directories
> shared on each execution host (and master/submit
> hosts).
> If I try to execute the line:
> $SGE_ROOT/mpich2_smpd/bin/lx24-x86/start_mpich2 -n
> <host> $MPICH2_ROOT/bin/smpd <port> from a bash shell
> I recieve connection refused errors (naturally).  I'm
> not 100% sure how the RSH wrapper script and the howto
> on ssh integration work together to make this happen.
> I guess what I'm asking is if anyone was able to get
> this working, and how, rather then reinvent the
> wheel...or perhaps I'm just way off.  I've been
> reading just about all the posts on this mailing list
> and I haven't found anyone who's been sucessful (or at
> least posted the solution).  It may n ot even be
> possible given the differences between rsh and ssh.
> Any help would be greatly appreciated!
> Many thanks,
> Jonathan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list