[GE users] Anybody able to get SGE MPICH2 Tight Integration via SSH working?
jonathanschreiter at yahoo.com
Wed Jan 4 17:27:04 GMT 2006
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
I'm new to SGE, and trying to enable tight integration
with mpich2 and ssh (SGE 6.07, mpich2 1.0.3, FC4 linux
2.6 kernel with latest ssh). I found the two howto's
on the project site re ssh integegration and
integration with mpich2 via rsh. The rsh security
concerns are the primary reason I'd like to use ssh -
(specifically ssh based passphraseless keys on a per
user basis which is a bit better) - also because of
the requirement to disable the firewall for mpich2 to
work with dynamic port assignments (even if one
specifies the primary listen to port).
If I use the original scripts included in
$SGE_ROOT/mpi and have smpd started on the execution
hosts, I am able to sucessfully submit and execute mpi
jobs via a PE mpich2 environment on SGE. I can also
start the smpd process on the exe hosts via a submit
job using ssh. However, I do not know how one could
ever implement SGE w/o tight integration this way with
failed scripts / memory leaks / limbo processes, etc.
So I've been following the section "Tight Integration
of the daemon-based smpd startup method" closely.
Looking at the start_mpich2.c file, there doesn't
appear to be any rsh specific methods that need
changing (just a fork()). In startmpi2.sh the area
where I think needs modification is:
to something like rshcmd="ssh -i <~/.ssh/user's
I have the $SGE_ROOT/mpich2_smpd and home directories
shared on each execution host (and master/submit
If I try to execute the line:
<host> $MPICH2_ROOT/bin/smpd <port> from a bash shell
I recieve connection refused errors (naturally). I'm
not 100% sure how the RSH wrapper script and the howto
on ssh integration work together to make this happen.
I guess what I'm asking is if anyone was able to get
this working, and how, rather then reinvent the
wheel...or perhaps I'm just way off. I've been
reading just about all the posts on this mailing list
and I haven't found anyone who's been sucessful (or at
least posted the solution). It may n ot even be
possible given the differences between rsh and ssh.
Any help would be greatly appreciated!
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users