[GE users] qrsh config?

reuti reuti at staff.uni-marburg.de
Fri Dec 19 19:03:39 GMT 2008


Am 19.12.2008 um 19:27 schrieb Alex Chekholko:

> <snip>
>>> qlogin_daemon                /usr/sbin/sshd -i
>>> rlogin_daemon                /usr/sbin/sshd -i
>>> rsh_daemon                   /usr/sbin/sshd -i
>>> rsh_command                  /usr/bin/ssh -o StrictHostChecking=no
>>> rlogin_command               /usr/bin/ssh -o StrictHostChecking=no
>>
>> Maybe a typo: StrictHostKeyChecking
>
> Good catch! I changed that.  It may or may not have helped.  qrsh  
> now works as root (maybe it did before), but still not working as a  
> regular user:

The sgeexec was started by root?

$ ps -e f -o user,ruser,command

> [chekh at beta.genomics.upenn.edu] ~ [0]
> $ qrsh uname -a
> error: error reading returncode of remote command
> [chekh at beta.genomics.upenn.edu] ~ [1]
> $ qconf -sconf |grep ssh
> qlogin_daemon                /usr/sbin/sshd -i

For qlogin with ssh this might help:

http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html

> rlogin_daemon                /usr/sbin/sshd -i
> rsh_daemon                   /usr/sbin/sshd -i
> rsh_command                  /usr/bin/ssh -o StrictHostKeyChecking=no
> rlogin_command               /usr/bin/ssh -o StrictHostKeyChecking=no
> [chekh at beta.genomics.upenn.edu] ~ [0]
> $ qrsh hostname
> error: error reading returncode of remote command
>
> I see the command show up in qstat as running, and when I look on  
> the node I see:
>
> root      5549  0.0  0.0  88392  4456 ?        S    Dec04  17:02 / 
> gpfs/fs0/share/ge-6.1u3/bin/lx24-amd64/sge_execd
> root     19322  0.0  0.0  32828  3356 ?        S    13:20   0:00   
> \_ sge_shepherd-1178088 -bg
> root     19323  0.0  0.0  33340  2908 ?        Ss   13:20    
> 0:00      \_ sge_shepherd-1178088 -bg
>
> and then in the log after they disappear:
> Dec 19 13:21:51 node-r1-u1-c34-p10-o2 kernel: sge_shepherd[19322]:  
> segfault at 0000000000000001 rip 00002ae3c44087a7 rsp  
> 00007fffe6fba440 error 4
> Dec 19 13:21:51 node-r1-u1-c34-p10-o2 kernel: sge_shepherd[19323]:  
> segfault at 0000000000000001 rip 00002ae3c44087a7 rsp  
> 00007fffe6fbcce0 error 4

Is there any outout in /tmp on the node of these daemons?

-- Reuti


>
> Any suggestions?
>
> [root at node-r1-u1-c34-p10-o2.local] ~ [0]
> # uname -a
> Linux node-r1-u1-c34-p10-o2.local 2.6.18-92.1.18.el5 #1 SMP Wed Nov  
> 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>
> [root at beta.genomics.upenn.edu] ~ [0]
> # qrsh hostname
> node-r1-u12-c23-p11-o10.local
> [root at beta.genomics.upenn.edu] ~ [0]
> # qrsh hostname
> node-r1-u14-c21-p11-o11.local
> [root at beta.genomics.upenn.edu] ~ [0]
> # qrsh hostname
> node-r1-u19-c16-p10-o14.local
>
> Regards,
> Alex
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=93422
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93426

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list