[GE users] trying tight ssh integration

Gerald Ragghianti geri at utk.edu
Wed Nov 19 01:57:33 GMT 2008


I have solved the problem that I was having with tight ssh integration.  
The issue was that the function sgessh_do_setusercontext() is not a 
drop-in replacement for do_setusercontext() in openssh version 4.3p1 
(Redhat EL4.6).  The apparent differences between these two functions 
caused sshd to fail somewhere around where it tries to execute 
qrsh_starter. 

I was able to solve the problem by using openssh-3.9p1.  Everthing works 
fine now.  For future reference, here is my proceedure for building sshd:
=====================================
#!/bin/sh
rm -rf gridengine
tar -zxf ge-V61u5_TAG-src.tar.gz
cp -f aimk.site gridengine/source/
cp -f aimk gridengine/source/

rm -rf openssh-3.9p1 gridengine/source/3rdparty/openssh
tar -zxf openssh-3.9p1.tar.gz
mv openssh-3.9p1 gridengine/source/3rdparty/openssh
cp sshd.c.3.9p1 gridengine/source/3rdparty/openssh/sshd.c

cd gridengine/source
./aimk -no-java -no-secure -spool-classic -no-jni -only-depend && \
scripts/zerodepend && \
./aimk -no-java -no-secure -spool-classic -no-jni depend && \
./aimk -no-java -no-secure -spool-classic -no-jni && \
./aimk -no-java -no-secure -spool-classic -no-jni -tight-ssh
=====================================
Here is the diff that I used to patch sshd.c:
105a106,111
 > #define SGESSH_INTEGRATION
 > #ifdef SGESSH_INTEGRATION
 > extern int sgessh_readconfig(void);
 > extern int sgessh_do_setusercontext(struct passwd *);
 > #endif
 >
675c681,686
<       do_setusercontext(authctxt->pw);
---
 >       /* do_setusercontext(authctxt->pw); */
 >  #ifdef SGESSH_INTEGRATION
 >    sgessh_do_setusercontext(authctxt->pw);
 >  #else
 >    do_setusercontext(authctxt->pw);
 >  #endif
899a911,914
 >  #ifdef SGESSH_INTEGRATION
 >    sgessh_readconfig();
 >  #endif
 >

Once you have an sshd binary.  You simply need to copy somewhere on the 
execd machines and reference it as

rsh_daemon                   /opt/n1ge/utilbin/lx24_amd64/sshd -i

- Gerald

rayson wrote:
> There are only 2 things added in the tight sshd. One is for reading of
> the environment (the job file), and the other is for switching of the
> user account from root to the actual user using the SGE way.
>
> Can you add a few debug fprintf()s around sgessh_readconfig() and
> sgessh_do_setusercontext() and record which one is causing the
> failure?? . You may need to log to a file as you may not have
> stdout/stderr access. If you don't know C, I may be able to write some
> code for you when I have time later this month...
>
> There is also a presentation on the tight integration:
> http://gridengine.sunsource.net/download/workshop10-12_09_07/SGE-WS2007-openSSHTightIntegration_RonChen.pdf
>
> Rayson
>
>
>
> On 11/15/08, Gerald Ragghianti <geri at utk.edu> wrote:
>   
>> Yes, that's a good point that I left off.  We have been using "loose"
>> ssh integration for a while now.  I can easily switch between using the
>> stock distribution sshd (which works) and the sgesshd (which doesn't).
>>
>> - Gerald
>>
>>     
>>> Before trying the tight-integration sshd, please make sure that
>>> non-tight integration
>>> (http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html) works.
>>>
>>> Rayson
>>>
>>>
>>>
>>> On 11/15/08, Gerald Ragghianti <geri at utk.edu> wrote:
>>>
>>>       
>>>> I am trying to get tight ssh integration working on my 6.1u5 system
>>>> using openssh-4.3p1.  After successfully compiling with "aimk -no-java
>>>> -no-secure -spool-classic -no-jni" I then compiled openssh with "aimk
>>>> -no-java -no-secure -spool-classic -no-jni -tight-ssh".  This resulted
>>>> in an sshd binary that I moved to $SGE_ROOT/utilbin/lx24-amd64/sshd.  I
>>>> then updated rsh_daemon to point to this binary.  When I execute "qrsh
>>>> -verbose id", the command returns:
>>>>
>>>> Your job 946 ("id") has been submitted
>>>> waiting for interactive job to be scheduled ...
>>>> Your interactive job 946 has been successfully scheduled.
>>>> Establishing /usr/bin/ssh -X  session to host sun15.local ...
>>>> /usr/bin/ssh -X  exited with exit code 254
>>>> reading exit code from shepherd ... 129
>>>>
>>>> Log files:
>>>>
>>>> qmaster: job 946.1 failed on host sun15.local assumedly after job
>>>> because: job 946.1 died through signal HUP (1)
>>>>
>>>> On the exec host: reaping job "946" ptf complains: Job does not exist
>>>>
>>>> When I change rsh_command to "/usr/bin/ssh -vX" I get the following from
>>>> qrsh:
>>>> ...
>>>> debug1: Offering public key: /home/user/.ssh/id_rsa
>>>> debug1: Server accepts key: pkalg ssh-rsa blen 149
>>>> debug1: read PEM private key done: type RSA
>>>> debug1: Authentication succeeded (publickey).
>>>> debug1: channel 0: new [client-session]
>>>> debug1: Entering interactive session.
>>>> debug1: Requesting X11 forwarding with authentication spoofing.
>>>> debug1: Sending command: exec '/opt/sge/utilbin/lx24-amd64/qrsh_starter'
>>>> '/opt/sge/default/spool/sun15/active_jobs/946.1'
>>>> debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
>>>> debug1: channel 0: free: client-session, nchannels 1
>>>> debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 0.1 seconds
>>>> debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
>>>> debug1: Exit status 254
>>>> /usr/bin/ssh -vX  exited with exit code 254
>>>> reading exit code from shepherd ... 129
>>>>
>>>> This seems to indicated that the ssh authentication succeeds, but that
>>>> the qrsh_starter fails to execute.  I have an strace of the execd that
>>>> shows sshd being executed and subsequently rummaging around the
>>>> $SGE_ROOT and correctly setting the groupid before exiting.
>>>>
>>>> Any ideas?
>>>>
>>>> --
>>>> Gerald Ragghianti
>>>> IT Administrator - High Performance Computing
>>>> http://hpc.usg.utk.edu/
>>>> Office of Information Technology
>>>> University of Tennessee
>>>> Phone: 865-974-2448
>>>> E-mail: geri at utk.edu
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88823
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>
>>>>
>>>>         
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88824
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>
>>>       
>> --
>> Gerald Ragghianti
>> IT Administrator - High Performance Computing
>> http://hpc.usg.utk.edu/
>> Office of Information Technology
>> University of Tennessee
>> Phone: 865-974-2448
>> E-mail: geri at utk.edu
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88825
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88827
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>   


-- 
Gerald Ragghianti
IT Administrator - High Performance Computing
http://hpc.usg.utk.edu/
Office of Information Technology
University of Tennessee
Phone: 865-974-2448
E-mail: geri at utk.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89030

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list