[GE users] qrsh config?

Alex Chekholko chekh at pcbi.upenn.edu
Fri Dec 19 18:27:38 GMT 2008


On Fri, 19 Dec 2008 18:42:33 +0100
reuti <reuti at staff.uni-marburg.de> wrote:

> Hi,
> 
> Am 19.12.2008 um 18:38 schrieb Alex Chekholko:
> 
> > This is SGE 6.1u3 on EL5.2 on x86_64.
> >
> > I think the 'qrsh' command on my cluster doesn't work, and I'm trying
> > to figure out why.
> >
> > I followed this guide:
> > http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
> >
> > Users on our cluster use 'qlogin' when they want an interactive shell.
> > SGE chooses an available slot, connects to that node and drops the  
> > user
> > at the prompt.  That's working fine and always has.
> >
> > I've never really used qrsh before and it does the following:
> >
> > [chekh at beta.genomics.upenn.edu] ~/mpi [1]
> > $ qrsh node-r2-u10-c25-p13-o6.local
> > error: error reading returncode of remote command
> > [chekh at beta.genomics.upenn.edu] ~/mpi [1]
> > $ qrsh
> > [chekh at beta.genomics.upenn.edu] ~/mpi [1]
> > $ qrsh uname -a
> > error: error reading returncode of remote command
> >
> > If I type just "qrsh" it returns immediately with an error.  If I give
> > it any kind of argument, it takes a while.
> >
> > Is the command "qrsh <nodename>" supposed to work?
> 
> you have to supply a command, not a nodename. E.g.:
> 
> $ qrsh hostname
> 
> $ qrsh date
> 
> $ qrsh ls
> 
> > [root at beta.genomics.upenn.edu] ~ [0]
> > # qconf -sconf|grep ssh
> > qlogin_daemon                /usr/sbin/sshd -i
> > rlogin_daemon                /usr/sbin/sshd -i
> > rsh_daemon                   /usr/sbin/sshd -i
> > rsh_command                  /usr/bin/ssh -o StrictHostChecking=no
> > rlogin_command               /usr/bin/ssh -o StrictHostChecking=no
> 
> Maybe a typo: StrictHostKeyChecking

Good catch! I changed that.  It may or may not have helped.  qrsh now works as root (maybe it did before), but still not working as a regular user:

[chekh at beta.genomics.upenn.edu] ~ [0] 
$ qrsh uname -a
error: error reading returncode of remote command
[chekh at beta.genomics.upenn.edu] ~ [1] 
$ qconf -sconf |grep ssh
qlogin_daemon                /usr/sbin/sshd -i
rlogin_daemon                /usr/sbin/sshd -i
rsh_daemon                   /usr/sbin/sshd -i
rsh_command                  /usr/bin/ssh -o StrictHostKeyChecking=no
rlogin_command               /usr/bin/ssh -o StrictHostKeyChecking=no
[chekh at beta.genomics.upenn.edu] ~ [0] 
$ qrsh hostname
error: error reading returncode of remote command

I see the command show up in qstat as running, and when I look on the node I see:

root      5549  0.0  0.0  88392  4456 ?        S    Dec04  17:02 /gpfs/fs0/share/ge-6.1u3/bin/lx24-amd64/sge_execd
root     19322  0.0  0.0  32828  3356 ?        S    13:20   0:00  \_ sge_shepherd-1178088 -bg
root     19323  0.0  0.0  33340  2908 ?        Ss   13:20   0:00      \_ sge_shepherd-1178088 -bg

and then in the log after they disappear:
Dec 19 13:21:51 node-r1-u1-c34-p10-o2 kernel: sge_shepherd[19322]: segfault at 0000000000000001 rip 00002ae3c44087a7 rsp 00007fffe6fba440 error 4
Dec 19 13:21:51 node-r1-u1-c34-p10-o2 kernel: sge_shepherd[19323]: segfault at 0000000000000001 rip 00002ae3c44087a7 rsp 00007fffe6fbcce0 error 4

Any suggestions?  

[root at node-r1-u1-c34-p10-o2.local] ~ [0] 
# uname -a
Linux node-r1-u1-c34-p10-o2.local 2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

[root at beta.genomics.upenn.edu] ~ [0] 
# qrsh hostname
node-r1-u12-c23-p11-o10.local
[root at beta.genomics.upenn.edu] ~ [0] 
# qrsh hostname
node-r1-u14-c21-p11-o11.local
[root at beta.genomics.upenn.edu] ~ [0] 
# qrsh hostname
node-r1-u19-c16-p10-o14.local

Regards,
Alex

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93422

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list