[GE users] trying tight ssh integration

Gerald Ragghianti geri at utk.edu
Sun Nov 16 04:28:21 GMT 2008

I am trying to get tight ssh integration working on my 6.1u5 system 
using openssh-4.3p1.  After successfully compiling with "aimk -no-java 
-no-secure -spool-classic -no-jni" I then compiled openssh with "aimk 
-no-java -no-secure -spool-classic -no-jni -tight-ssh".  This resulted 
in an sshd binary that I moved to $SGE_ROOT/utilbin/lx24-amd64/sshd.  I 
then updated rsh_daemon to point to this binary.  When I execute "qrsh 
-verbose id", the command returns:

Your job 946 ("id") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 946 has been successfully scheduled.
Establishing /usr/bin/ssh -X  session to host sun15.local ...
/usr/bin/ssh -X  exited with exit code 254
reading exit code from shepherd ... 129

Log files:

qmaster: job 946.1 failed on host sun15.local assumedly after job 
because: job 946.1 died through signal HUP (1)

On the exec host: reaping job "946" ptf complains: Job does not exist

When I change rsh_command to "/usr/bin/ssh -vX" I get the following from 
debug1: Offering public key: /home/user/.ssh/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 149
debug1: read PEM private key done: type RSA
debug1: Authentication succeeded (publickey).
debug1: channel 0: new [client-session]
debug1: Entering interactive session.
debug1: Requesting X11 forwarding with authentication spoofing.
debug1: Sending command: exec '/opt/sge/utilbin/lx24-amd64/qrsh_starter' 
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: channel 0: free: client-session, nchannels 1
debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 0.1 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
debug1: Exit status 254
/usr/bin/ssh -vX  exited with exit code 254
reading exit code from shepherd ... 129

This seems to indicated that the ssh authentication succeeds, but that 
the qrsh_starter fails to execute.  I have an strace of the execd that 
shows sshd being executed and subsequently rummaging around the 
$SGE_ROOT and correctly setting the groupid before exiting.

Any ideas?

Gerald Ragghianti
IT Administrator - High Performance Computing
Office of Information Technology
University of Tennessee
Phone: 865-974-2448
E-mail: geri at utk.edu


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list