[GE users] qlogin and sshd errors (and JOB_ID qlogin environment)

Reuti reuti at staff.uni-marburg.de
Wed Nov 2 16:22:29 GMT 2005


Joe Landman wrote:
> Hi Federico:
> 
>   This could cause you some problems.  Imagine spawning off multiple 
> independent processes that all seek to start messing with the 
> environment file.  Chaos is likely what you will run into.

The complete environment is also in a SGE spool file like:

/var/spool/sge/node29/active_jobs/1147.1/environment
/var/spool/sge/node12/active_jobs/1142.1/1.node12/environment

Of course, you need to know the node and job number to source it. - Reuti

> 
>   What we have taken to doing is to place any environment bits in our 
> job script itself.  This way the environment is local.  If you need 
> something like X, you might need some more complex things (such as 
> making sure that the ForwardX11 option is set true on the compute nodes)
> 
> Joe
> 
> Sacerdoti, Federico wrote:
> 
>> Actually, I found the problem: I am using qlogin over ssh, and the sshd
>> ignores all environment variables when it starts. Basically sge-shepherd
>> does set the correct evironment, but they don't make it to the final
>> session.
>>
>> The work around is to use an sshd wrapper. This strategy hijacks the
>> $HOME/.ssh/environment facility:
>>
>> ---
>> #!/bin/sh
>> # Author: D.E.Shaw R&D LLC, F.D.Sacerdoti 2005
>> #
>> # SSHD will erase the helpful env vars that sge puts in. This forces
>> # them to survive, but we usurp the $HOME/.ssh/environment file.
>> #
>> env > $HOME/.ssh/environment
>> echo "SGE_HOSTLIST=$SGE_O_HOME/$JOB_NAME.po$JOB_ID" >>
>> $HOME/.ssh/environment
>>
>> /usr/sbin/sshd -i -b 512 -o 'AcceptEnv *' -o 'PermitUserEnvironment yes'
>>
>> rm -f $HOME/.ssh/environment
>> ---
>>
>> But it works. A better way would be to instruct sshd to simply absorb
>> the calling environment, but there does not seem to be a flag or option
>> for that.
>>
>> -Federico
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de] Sent: Tuesday, 
>> November 01, 2005 4:47 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] qlogin and sshd errors (and JOB_ID qlogin
>> environment)
>>
>>
>> Hi Federico,
>>
>> Am 01.11.2005 um 21:42 schrieb Sacerdoti, Federico:
>>
>>
>>> Thanks Reuti,
>>>
>>> I found the problem for qlogin/qrsh. The per-host configuration of
>>> 'qlogin-daemon' was set to '/usr/sbin/in.telnetd', so no matter  what 
>>> the
>>> default (cluster-wide) value is, the /usr/sbin/in.telnetd daemon was
>>> started. Once this was fixed things went smoothly.
>>>
>>> I have another question. When I qlogin/qrsh via SSH, I do not get the
>>> JOB_ID environment variable. In fact none of the SGE_O_* variables are
>>> available. I have turned on
>>>
>>> -o 'SendEnv *' and
>>>
>>> -o 'AcceptEnv *'
>>
>>
>>
>> this would send the variables from the login machine to the selected  
>> node for your interactive job. But also on the login machine is only  
>> the normal environment set. You can try to set them by hand or like  
>> Roland suggested for interactive qrsh:
>>
>> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=13465
>>
>> You can try to use /bin/sh or /bin/bash (the sourced files will be  
>> different, although it's the same binary on Linux). Also the options - 
>> i and -l (lowercase L) might be interesting. Sometimes you don't get  
>> a prompt and just have to type the commands. For qlogin maybe it's  
>> impossible for now.
>>
>> Cheers - Reuti
>>
>>
>>
>>> but to no avail.
>>>
>>> Thanks,
>>> -Federico
>>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: Tuesday, October 25, 2005 5:03 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] qlogin and sshd errors
>>>
>>>
>>> Correct, SGE is installed so that the daemons run as root, what is
>>> the suggested operation mode. Is yours running under your user
>>> account? In this case you can submit just serial jobs, but the qrsh
>>> for parallel jobs won't work also.
>>>
>>> You can check this e.g. with:
>>>
>>> $ ps -e f -o ruser,euser,rgroup,egroup,command
>>> ...
>>> root     sgeadmin root     gridware /usr/sge/bin/lx24-x86/sge_execd
>>> root     sgeadmin root     gridware  \_ sge_shepherd-374 -bg
>>>
>>>
>>> Cheers  - Reuti
>>>
>>>
>>> Am 25.10.2005 um 20:29 schrieb Sacerdoti, Federico:
>>>
>>>
>>>> Thanks, this is good to know it works for you.
>>>>
>>>> Do you run sge as root? I am seeing permissions problems with sshd...
>>>>
>>>> -fds
>>>>
>>>> -----Original Message-----
>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: Monday, October 24, 2005 4:48 PM
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] qlogin and sshd errors
>>>>
>>>>
>>>> Hi Federico,
>>>>
>>>> Am 24.10.2005 um 21:39 schrieb Sacerdoti, Federico:
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> I apologize if this has already been answered. I would like to use
>>>>> qlogin with ssh, and followed the instructions here
>>>>>
>>>>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>>>>
>>>>> While qlogin does schedule my job correctly, and the sshd gets
>>>>> started,
>>>>> I cannot connect to it. My qlogin-wrapper shows which port and
>>>>> host to
>>>>> connect to (I have restricted my sge pool to one host to make things
>>>>> easier).
>>>>>
>>>>> I get the following strange error when I try to connect to the port
>>>>> that
>>>>> SGE wants me to. Has anyone seen this?:
>>>>>
>>>>> [fds at drdab000 .ssh]$ ssh -vvv drda1054 -p 35072
>>>>> OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003
>>>>> debug2: ssh_connect: needpriv 0
>>>>> debug1: Connecting to drda1054 [10.255.4.60] port 35072.
>>>>> debug1: Connection established.
>>>>> debug1: identity file /u/fds/.ssh/identity type -1
>>>>> debug3: Not a RSA1 key file /u/fds/.ssh/id_rsa.
>>>>> debug2: key_type_from_name: unknown key type '-----BEGIN'
>>>>> debug3: key_read: missing keytype
>>>>> debug2: key_type_from_name: unknown key type 'Proc-Type:'
>>>>> debug3: key_read: missing keytype
>>>>> debug2: key_type_from_name: unknown key type 'DEK-Info:'
>>>>> debug3: key_read: missing keytype
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug3: key_read: missing whitespace
>>>>> debug2: key_type_from_name: unknown key type '-----END'
>>>>> debug3: key_read: missing keytype
>>>>> debug1: identity file /u/fds/.ssh/id_rsa type 1
>>>>> debug1: identity file /u/fds/.ssh/id_dsa type -1
>>>>> ssh_exchange_identification: Connection closed by remote host
>>>>>
>>>>
>>>> for me it's working under 6.0u4 and SuSE 9.3. So it may be an issue
>>>> with your ssh setup. You created the keys with ssh-keygen and copied
>>>> the public one to authorized keys? Can you try to delete the key
>>>> information and generate new ones?
>>>>
>>>> Only difference is the version: "OpenSSH_3.9p1, OpenSSL 0.9.7e 25 Oct
>>>> 2004"  for me. Maybe you are getting the "e" version from the
>>>> included libs in SGE: "ldd /usr/bin/ssh". You can try to change the
>>>> LD_LIBRARY_PATH (same OS on the nodes and your login machine?). -
>>>> Reuti
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list