[GE users] Qrsh qlogin with ssh on RedHat Linux
reuti at staff.uni-marburg.de
Thu May 15 08:18:29 BST 2008
[ The following text is in the "WINDOWS-1252" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Am 14.05.2008 um 22:56 schrieb Lewis, Daniel ((IS Consultant)):
> SGE 6.1u3 Master on Solaris 10 with SunHPC 7.1
> SGE 6.1u3 Exec on Linux (RHEL 5.1) with OpenMPI 1.2.5
> OpenMPI 1.2.5 was compiled from source with the --with-sge flag,
> ompi-info shows gridengine support.
> linuxcluster.q contains only the Linux cluster hosts
> Linux environment is a Beowulf-style cluster, two interfaces
> including one on a subnet shared by the SGE master.
> In this SGE - OpenMPI mixed environment I'm trying to troubleshoot
> process spawning from within an MPI-enabled application. I can
> exercise "mpirun" all day without any issues, interactively, but
> spawning from within the application results in "A daemon failed to
> start ?"
> I've attempted to use ssh instead of rsh just in case that is a
> factor - however, the directions for replacing rsh with ssh do not
> work on RedHat linux - for one thing, the directions specify "sshd -
> i" so that sshd will start via xinetd - on RH sshd is not started
> via xinet.d. In
the -i will only change the behavior of sshd. You don't need xinetd
at all in the cluster. One sshd will be started by SGE per qrsh
command on a random port. Hence there shouldn't be any firewall on
the machines. As Open MPI is running interactivly, I assume this is
already the case. You could check this on a node with "ps -e f" and
the sshd chould be a kid of sge_shepherd.
It's possible to have a cluster without rsh and ssh being available
all the time and force users also for an interactive login on a node
to use qlogin/qrsh.
> my attempts to set that up, sgeexed segfaults on the target machine
> and I get the indication that something is trying to start another
> sshd on port 22. What is the proper value for rlogin daemon and
> rlogin command, etc, for a Linux config?
It should use another port. I wonder how it's still trying to use
> "May 14 12:43:50 marl-clus1-cn08 sshd: error: Bind to port 22
> on 0.0.0.0 failed: Address already in use."
> Note, ssh by itself works just fine throughout the environment.
> Has anyone revised the directions at http://
> gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html to work for RH
> Dan L.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users