[GE users] qlogin and sshd errors (and JOB_ID qlogin environment)

Joe Landman landman at scalableinformatics.com
Wed Nov 2 16:04:26 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Federico:

   This could cause you some problems.  Imagine spawning off multiple 
independent processes that all seek to start messing with the 
environment file.  Chaos is likely what you will run into.

   What we have taken to doing is to place any environment bits in our 
job script itself.  This way the environment is local.  If you need 
something like X, you might need some more complex things (such as 
making sure that the ForwardX11 option is set true on the compute nodes)

Joe

Sacerdoti, Federico wrote:
> Actually, I found the problem: I am using qlogin over ssh, and the sshd
> ignores all environment variables when it starts. Basically sge-shepherd
> does set the correct evironment, but they don't make it to the final
> session.
> 
> The work around is to use an sshd wrapper. This strategy hijacks the
> $HOME/.ssh/environment facility:
> 
> ---
> #!/bin/sh
> # Author: D.E.Shaw R&D LLC, F.D.Sacerdoti 2005
> #
> # SSHD will erase the helpful env vars that sge puts in. This forces
> # them to survive, but we usurp the $HOME/.ssh/environment file.
> #
> env > $HOME/.ssh/environment
> echo "SGE_HOSTLIST=$SGE_O_HOME/$JOB_NAME.po$JOB_ID" >>
> $HOME/.ssh/environment
> 
> /usr/sbin/sshd -i -b 512 -o 'AcceptEnv *' -o 'PermitUserEnvironment yes'
> 
> rm -f $HOME/.ssh/environment
> ---
> 
> But it works. A better way would be to instruct sshd to simply absorb
> the calling environment, but there does not seem to be a flag or option
> for that.
> 
> -Federico
> 
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Tuesday, November 01, 2005 4:47 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] qlogin and sshd errors (and JOB_ID qlogin
> environment)
> 
> 
> Hi Federico,
> 
> Am 01.11.2005 um 21:42 schrieb Sacerdoti, Federico:
> 
> 
>>Thanks Reuti,
>>
>>I found the problem for qlogin/qrsh. The per-host configuration of
>>'qlogin-daemon' was set to '/usr/sbin/in.telnetd', so no matter  
>>what the
>>default (cluster-wide) value is, the /usr/sbin/in.telnetd daemon was
>>started. Once this was fixed things went smoothly.
>>
>>I have another question. When I qlogin/qrsh via SSH, I do not get the
>>JOB_ID environment variable. In fact none of the SGE_O_* variables are
>>available. I have turned on
>>
>>-o 'SendEnv *' and
>>
>>-o 'AcceptEnv *'
> 
> 
> this would send the variables from the login machine to the selected  
> node for your interactive job. But also on the login machine is only  
> the normal environment set. You can try to set them by hand or like  
> Roland suggested for interactive qrsh:
> 
> http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=13465
> 
> You can try to use /bin/sh or /bin/bash (the sourced files will be  
> different, although it's the same binary on Linux). Also the options - 
> i and -l (lowercase L) might be interesting. Sometimes you don't get  
> a prompt and just have to type the commands. For qlogin maybe it's  
> impossible for now.
> 
> Cheers - Reuti
> 
> 
> 
>>but to no avail.
>>
>>Thanks,
>>-Federico
>>
>>-----Original Message-----
>>From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>Sent: Tuesday, October 25, 2005 5:03 PM
>>To: users at gridengine.sunsource.net
>>Subject: Re: [GE users] qlogin and sshd errors
>>
>>
>>Correct, SGE is installed so that the daemons run as root, what is
>>the suggested operation mode. Is yours running under your user
>>account? In this case you can submit just serial jobs, but the qrsh
>>for parallel jobs won't work also.
>>
>>You can check this e.g. with:
>>
>>$ ps -e f -o ruser,euser,rgroup,egroup,command
>>...
>>root     sgeadmin root     gridware /usr/sge/bin/lx24-x86/sge_execd
>>root     sgeadmin root     gridware  \_ sge_shepherd-374 -bg
>>
>>
>>Cheers  - Reuti
>>
>>
>>Am 25.10.2005 um 20:29 schrieb Sacerdoti, Federico:
>>
>>
>>>Thanks, this is good to know it works for you.
>>>
>>>Do you run sge as root? I am seeing permissions problems with sshd...
>>>
>>>-fds
>>>
>>>-----Original Message-----
>>>From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>Sent: Monday, October 24, 2005 4:48 PM
>>>To: users at gridengine.sunsource.net
>>>Subject: Re: [GE users] qlogin and sshd errors
>>>
>>>
>>>Hi Federico,
>>>
>>>Am 24.10.2005 um 21:39 schrieb Sacerdoti, Federico:
>>>
>>>
>>>
>>>>Hi,
>>>>
>>>>I apologize if this has already been answered. I would like to use
>>>>qlogin with ssh, and followed the instructions here
>>>>
>>>>http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>>>
>>>>While qlogin does schedule my job correctly, and the sshd gets
>>>>started,
>>>>I cannot connect to it. My qlogin-wrapper shows which port and
>>>>host to
>>>>connect to (I have restricted my sge pool to one host to make things
>>>>easier).
>>>>
>>>>I get the following strange error when I try to connect to the port
>>>>that
>>>>SGE wants me to. Has anyone seen this?:
>>>>
>>>>[fds at drdab000 .ssh]$ ssh -vvv drda1054 -p 35072
>>>>OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003
>>>>debug2: ssh_connect: needpriv 0
>>>>debug1: Connecting to drda1054 [10.255.4.60] port 35072.
>>>>debug1: Connection established.
>>>>debug1: identity file /u/fds/.ssh/identity type -1
>>>>debug3: Not a RSA1 key file /u/fds/.ssh/id_rsa.
>>>>debug2: key_type_from_name: unknown key type '-----BEGIN'
>>>>debug3: key_read: missing keytype
>>>>debug2: key_type_from_name: unknown key type 'Proc-Type:'
>>>>debug3: key_read: missing keytype
>>>>debug2: key_type_from_name: unknown key type 'DEK-Info:'
>>>>debug3: key_read: missing keytype
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug3: key_read: missing whitespace
>>>>debug2: key_type_from_name: unknown key type '-----END'
>>>>debug3: key_read: missing keytype
>>>>debug1: identity file /u/fds/.ssh/id_rsa type 1
>>>>debug1: identity file /u/fds/.ssh/id_dsa type -1
>>>>ssh_exchange_identification: Connection closed by remote host
>>>>
>>>
>>>for me it's working under 6.0u4 and SuSE 9.3. So it may be an issue
>>>with your ssh setup. You created the keys with ssh-keygen and copied
>>>the public one to authorized keys? Can you try to delete the key
>>>information and generate new ones?
>>>
>>>Only difference is the version: "OpenSSH_3.9p1, OpenSSL 0.9.7e 25 Oct
>>>2004"  for me. Maybe you are getting the "e" version from the
>>>included libs in SGE: "ldd /usr/bin/ssh". You can try to change the
>>>LD_LIBRARY_PATH (same OS on the nodes and your login machine?). -
>>>Reuti
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list