[GE users] qlogin / ssh (ultimately, X forwarding)

rumpelkeks tina.friedrich at diamond.ac.uk
Wed Feb 3 16:34:42 GMT 2010


Right. I've reverted the config for my test node, and started over 
again. This is what I did / what happens; maybe I am doing it wrong and 
someone can tell me where.

With no extra configuration, qlogin (and qrsh) work.

Now I add the following configuration to the host:

[kdf51254 at pc030 tmp]$ qconf -sconf cs04r-sc-com04-15
#cs04r-sc-com04-15.diamond.ac.uk:
xterm             /usr/bin/xterm
qlogin_daemon     /usr/sbin/sshd -i
rsh_daemon        /usr/sbin/sshd -i
rlogin_daemon     /usr/sbin/sshd -i
qlogin_command    /dls_sw/apps/sge/SGE6.2/DLS/common/qlogin_wrapper
rsh_command       /usr/bin/ssh -X
rlogin_command    /usr/bin/ssh -X

with /dls_sw/apps/sge/SGE6.2/DLS/common/qlogin_wrapper reading:

[kdf51254 at pc030 ~]$ cat /dls_sw/apps/sge/SGE6.2/DLS/common/qlogin_wrapper
#!/bin/sh
HOST=$1
PORT=$2
/usr/bin/ssh -X -p $PORT $HOST

Now when I try a qlogin again, this is all I get:

[kdf51254 at pc030 ~]$ qlogin -q test.q
Your job 1180935 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 1180935 has been successfully scheduled.

Your interactive job 1180935 has been successfully scheduled.

Your interactive job 1180935 has been successfully scheduled.

(goes on for a bit more and then the node goes into error state)

I see the job in qstat:

queuename                      qtype resv/used/tot. load_avg arch 
    states
---------------------------------------------------------------------------------
test.q at cs04r-sc-com04-15.diamo BIP   0/1/4          0.03     lx24-amd64
1180935 0.50500 QLOGIN     kdf51254     r     02/03/2010 16:07:52     1

and the only processes on the node are:

sgeadmin root     /dls_sw/apps/sge/SGE6.2/bin/lx24-amd64/sge_execd
sgeadmin root      \_ sge_shepherd-1180978 -bg
root     root          \_ sge_shepherd-1180978 -bg

(not the only ones clearly, but the SGE related ones!)

qrsh doesn't work either. Similar symptoms. (I don't get any 'replies' 
to my qrsh request, processes on the node look the same).

Removing the configuration again (qconf -dconf), qlogin/qrsh again work.

Also, is it normal I seem to have to ssh into the node and restart 
sge_execd whenever I change the config (using qconf -mconf)? It never 
seems to pick them up, and I thought it should?

Tina

rumpelkeks wrote:
> Hi,
> 
> <snip>
>> Are you running SELinux on the nodes?
> 
> Nope. Running Lustre, SELinux disabled everywhere. But yes other than 
> that that would've been my first guess as well, and I actually went and 
> double checked :)
> 
>> Normal qsub is working?
> 
> Oh yes! qlogin using builtin working fine, as well.
> 
>>>>> <snip>
>>> Sorry, might've gotten confusing here. In this first instance I'd be
>>> quite happy to get a login. My problem is that it doesn't work at all.
>>> (I mean even if the X forwarding fails, I should just get a
>>> shell/prompt/something like that, or not?)
>> Yep.
> 
> Which is what I don't get (I guess you guessed). I did try to enable 
> debug output in ssh, but I don't even get that far. I mean usually job 
> get's schedules and you get 'establishing builtin session' - I never get 
> the 'establish session'.
> 
> Also, on the node (watching netstat), I never get more than TIME_WAIT 
> for any connection status on the port that the job reports.
> 
> (Oh, and when I've got a 'normal' ssh open into the node, a qsh -display 
> :10.0 works; so the 'connection command' works I guess. To me, it looks 
> a lot as if there's never any sshd process started when I try to connect.)
> 
>>>>> <snip>
>>> I know what process SGE runs as. I want to know what it would try to
>>> start the ssh process as. The user that wants to login, the user that
>>> SGE runs as; does it have setuid on something...? (This could very  
>>> well
>>> be the user running SGE (sgeadmin) not being allowed to start sshd
>>> process.) Also, when trying this directly, I cannot run "sshd -i"; is
>>> this required to work, or can it be used without being run from inetd?
>> It's not run from inetd:
>>
>> sgeadmin root     /usr/sge/bin/lx24-x86/sge_execd
>> sgeadmin root      \_ sge_shepherd-409 -bg
>> root     root          \_ sshd: reuti [priv]
>> reuti    reuti             \_ sshd: reuti at pts/0
>> reuti    reuti                 \_ -bash
>> reuti    reuti                     \_ ps -e f -o user,ruser,command
> 
> Okay. So it IS running sshd as root, even though the execd runs as 
> sgeadmin (I also have group sgeadmin, I think - that should not be a 
> problem?)
> 
> I've gone back to builtin and I'll do all the steps again, maybe I can 
> find some more info on what's going wrong were.
> 
> Tina
> 
>> -- Reuti
>>
>>
>>> Tina
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=242659
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=242913
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
> 
> 


-- 
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=242953

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list