[GE users] trying tight ssh integration

Gerald Ragghianti geri at utk.edu
Wed Nov 19 05:27:52 GMT 2008


Hi Rayson,
I think that the problem is related to the use of PAM on my system.  If 
you look at session.c there is a lot of calls to pam methods that do 
things like set up ulimits.  This happened to be a problem for us 
because our openmpi over IB applications could not run unless 
limits.conf was set correctly for the user's process.  My quick solution 
was to add a few pam calls before setting the user context:

 #ifdef SGESSH_INTEGRATION
   do_pam_session();
   do_pam_setcred(0);
   sgessh_do_setusercontext(authctxt->pw);
 #else
   do_setusercontext(authctxt->pw);
 #endif

I don't know if this is the cleanest solution, but it does work.

- Gerald

rayson wrote:
> Great to hear that it works for you!!
>
> I downloaded openssh-4.3p1.tar.gz a few days ago but I didn't have
> time to look into the issue :-(
>
> Also, 5.1 came out and I also did not try to use it with the tight
> integration. BTW, can you provide more detail about the difference
> between sgessh_do_setusercontext() and do_setusercontext()??
>
> Thanks,
> Rayson
>
>
>
> On 11/18/08, Gerald Ragghianti <geri at utk.edu> wrote:
>   
>> I have solved the problem that I was having with tight ssh integration.
>> The issue was that the function sgessh_do_setusercontext() is not a
>> drop-in replacement for do_setusercontext() in openssh version 4.3p1
>> (Redhat EL4.6).  The apparent differences between these two functions
>> caused sshd to fail somewhere around where it tries to execute
>> qrsh_starter.
>>
>> I was able to solve the problem by using openssh-3.9p1.  Everthing works
>> fine now.  For future reference, here is my proceedure for building sshd:
>> =====================================
>> #!/bin/sh
>> rm -rf gridengine
>> tar -zxf ge-V61u5_TAG-src.tar.gz
>> cp -f aimk.site gridengine/source/
>> cp -f aimk gridengine/source/
>>
>> rm -rf openssh-3.9p1 gridengine/source/3rdparty/openssh
>> tar -zxf openssh-3.9p1.tar.gz
>> mv openssh-3.9p1 gridengine/source/3rdparty/openssh
>> cp sshd.c.3.9p1 gridengine/source/3rdparty/openssh/sshd.c
>>
>> cd gridengine/source
>> ./aimk -no-java -no-secure -spool-classic -no-jni -only-depend && \
>> scripts/zerodepend && \
>> ./aimk -no-java -no-secure -spool-classic -no-jni depend && \
>> ./aimk -no-java -no-secure -spool-classic -no-jni && \
>> ./aimk -no-java -no-secure -spool-classic -no-jni -tight-ssh
>> =====================================
>> Here is the diff that I used to patch sshd.c:
>> 105a106,111
>>  > #define SGESSH_INTEGRATION
>>  > #ifdef SGESSH_INTEGRATION
>>  > extern int sgessh_readconfig(void);
>>  > extern int sgessh_do_setusercontext(struct passwd *);
>>  > #endif
>>  >
>> 675c681,686
>> <       do_setusercontext(authctxt->pw);
>> ---
>>  >       /* do_setusercontext(authctxt->pw); */
>>  >  #ifdef SGESSH_INTEGRATION
>>  >    sgessh_do_setusercontext(authctxt->pw);
>>  >  #else
>>  >    do_setusercontext(authctxt->pw);
>>  >  #endif
>> 899a911,914
>>  >  #ifdef SGESSH_INTEGRATION
>>  >    sgessh_readconfig();
>>  >  #endif
>>  >
>>
>> Once you have an sshd binary.  You simply need to copy somewhere on the
>> execd machines and reference it as
>>
>> rsh_daemon                   /opt/n1ge/utilbin/lx24_amd64/sshd -i
>>
>> - Gerald
>>
>> rayson wrote:
>>     
>>> There are only 2 things added in the tight sshd. One is for reading of
>>> the environment (the job file), and the other is for switching of the
>>> user account from root to the actual user using the SGE way.
>>>
>>> Can you add a few debug fprintf()s around sgessh_readconfig() and
>>> sgessh_do_setusercontext() and record which one is causing the
>>> failure?? . You may need to log to a file as you may not have
>>> stdout/stderr access. If you don't know C, I may be able to write some
>>> code for you when I have time later this month...
>>>
>>> There is also a presentation on the tight integration:
>>> http://gridengine.sunsource.net/download/workshop10-12_09_07/SGE-WS2007-openSSHTightIntegration_RonChen.pdf
>>>
>>> Rayson
>>>
>>>
>>>
>>> On 11/15/08, Gerald Ragghianti <geri at utk.edu> wrote:
>>>
>>>       
>>>> Yes, that's a good point that I left off.  We have been using "loose"
>>>> ssh integration for a while now.  I can easily switch between using the
>>>> stock distribution sshd (which works) and the sgesshd (which doesn't).
>>>>
>>>> - Gerald
>>>>
>>>>
>>>>         
>>>>> Before trying the tight-integration sshd, please make sure that
>>>>> non-tight integration
>>>>> (http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html) works.
>>>>>
>>>>> Rayson
>>>>>
>>>>>
>>>>>
>>>>> On 11/15/08, Gerald Ragghianti <geri at utk.edu> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> I am trying to get tight ssh integration working on my 6.1u5 system
>>>>>> using openssh-4.3p1.  After successfully compiling with "aimk -no-java
>>>>>> -no-secure -spool-classic -no-jni" I then compiled openssh with "aimk
>>>>>> -no-java -no-secure -spool-classic -no-jni -tight-ssh".  This resulted
>>>>>> in an sshd binary that I moved to $SGE_ROOT/utilbin/lx24-amd64/sshd.  I
>>>>>> then updated rsh_daemon to point to this binary.  When I execute "qrsh
>>>>>> -verbose id", the command returns:
>>>>>>
>>>>>> Your job 946 ("id") has been submitted
>>>>>> waiting for interactive job to be scheduled ...
>>>>>> Your interactive job 946 has been successfully scheduled.
>>>>>> Establishing /usr/bin/ssh -X  session to host sun15.local ...
>>>>>> /usr/bin/ssh -X  exited with exit code 254
>>>>>> reading exit code from shepherd ... 129
>>>>>>
>>>>>> Log files:
>>>>>>
>>>>>> qmaster: job 946.1 failed on host sun15.local assumedly after job
>>>>>> because: job 946.1 died through signal HUP (1)
>>>>>>
>>>>>> On the exec host: reaping job "946" ptf complains: Job does not exist
>>>>>>
>>>>>> When I change rsh_command to "/usr/bin/ssh -vX" I get the following from
>>>>>> qrsh:
>>>>>> ...
>>>>>> debug1: Offering public key: /home/user/.ssh/id_rsa
>>>>>> debug1: Server accepts key: pkalg ssh-rsa blen 149
>>>>>> debug1: read PEM private key done: type RSA
>>>>>> debug1: Authentication succeeded (publickey).
>>>>>> debug1: channel 0: new [client-session]
>>>>>> debug1: Entering interactive session.
>>>>>> debug1: Requesting X11 forwarding with authentication spoofing.
>>>>>> debug1: Sending command: exec '/opt/sge/utilbin/lx24-amd64/qrsh_starter'
>>>>>> '/opt/sge/default/spool/sun15/active_jobs/946.1'
>>>>>> debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
>>>>>> debug1: channel 0: free: client-session, nchannels 1
>>>>>> debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 0.1 seconds
>>>>>> debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
>>>>>> debug1: Exit status 254
>>>>>> /usr/bin/ssh -vX  exited with exit code 254
>>>>>> reading exit code from shepherd ... 129
>>>>>>
>>>>>> This seems to indicated that the ssh authentication succeeds, but that
>>>>>> the qrsh_starter fails to execute.  I have an strace of the execd that
>>>>>> shows sshd being executed and subsequently rummaging around the
>>>>>> $SGE_ROOT and correctly setting the groupid before exiting.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> --
>>>>>> Gerald Ragghianti
>>>>>> IT Administrator - High Performance Computing
>>>>>> http://hpc.usg.utk.edu/
>>>>>> Office of Information Technology
>>>>>> University of Tennessee
>>>>>> Phone: 865-974-2448
>>>>>> E-mail: geri at utk.edu
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88823
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88824
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>>
>>>>>           
>>>> --
>>>> Gerald Ragghianti
>>>> IT Administrator - High Performance Computing
>>>> http://hpc.usg.utk.edu/
>>>> Office of Information Technology
>>>> University of Tennessee
>>>> Phone: 865-974-2448
>>>> E-mail: geri at utk.edu
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88825
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>
>>>>
>>>>         
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88827
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>
>>>       
>> --
>> Gerald Ragghianti
>> IT Administrator - High Performance Computing
>> http://hpc.usg.utk.edu/
>> Office of Information Technology
>> University of Tennessee
>> Phone: 865-974-2448
>> E-mail: geri at utk.edu
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89030
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89031
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>   


-- 
Gerald Ragghianti
IT Administrator - High Performance Computing
http://hpc.usg.utk.edu/
Office of Information Technology
University of Tennessee
Phone: 865-974-2448
E-mail: geri at utk.edu

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89034

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list