[GE users] Help with qrsh, SSH, and LDAP

David Olbersen dolbersen at nextwave.com
Mon Nov 5 18:01:07 GMT 2007


Hello all,

 

I've got a cluster running 6.0u8 and I'm trying to get qrsh configured
to use SSH.

 

I've already got SSH keys working on the cluster working for myself. The
problem comes when I try to use qrsh. I've followed the instructions
(http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html) and still
have problems.

 

I slightly modified the directions and set rsh_command to '/usr/bin/ssh
-v' to see what's going on at the SSH level.

 

On the client side I see the following:

 

% qrsh -N test -q q2 -verbose

Your job 1019 ("test") has been submitted

waiting for interactive job to be scheduled ...

Your interactive job 1019 has been successfully scheduled.

Establishing /usr/bin/ssh -v session to host
node-2.skunkworks.eng.atg.nw.net ...

OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003

debug1: Reading configuration data /users/dolbersen/.ssh/config

debug1: Applying options for *

debug1: Reading configuration data /etc/ssh/ssh_config

debug1: Applying options for *

debug1: Connecting to node-2.skunkworks.eng.atg.nw.net [172.24.19.116]
port 33257.

debug1: Connection established.

debug1: identity file /users/dolbersen/.ssh/identity type -1

debug1: identity file /users/dolbersen/.ssh/id_rsa type -1

debug1: identity file /users/dolbersen/.ssh/id_dsa type 2

ssh_exchange_identification: Connection closed by remote host

/usr/bin/ssh -v exited with exit code 255

 

In the qmaster messages I see this:

 

11/05/2007 09:54:43|qmaster|labmaster|W|job 1019.1 failed on host
node-2.skunkworks.eng.atg.nw.net assumedly after job because: job 1019.1
died through signal KILL (9)

 

And on the exechost (1 host in this queue) I see this in
/var/log/messages

 

Nov  5 09:54:42 node-2 sge_shepherd-1019: nss_ldap: reconnecting to LDAP
server...

Nov  5 09:54:42 node-2 sge_shepherd-1019: nss_ldap: reconnected to LDAP
server after 1 attempt(s)

 

I can't quite tell what's going on here and could really use some help.
All machines run CentOS 4.4, if that's of any use. This is a lab cluster
so I'm free to experiment. As I said above, SSH works outside.

 

________________________________

David Olbersen (x0623)

 

 




More information about the gridengine-users mailing list