[GE users] dumb ROCKS 5.3 w/ SGE roll question (qrsh not working)

mhanby mhanby at uab.edu
Thu Aug 19 15:24:04 BST 2010


Rocks has a bug in their SGE/OGE build process where they add the following to the configuration:

qrsh_command                 /usr/bin/ssh
rsh_command                  /usr/bin/ssh
rlogin_command               /usr/bin/ssh

Simply use 'qconf -mconf' and delete these lines. These override the settings further up in the configuration that uses 'builtin'

Mike

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Thursday, August 19, 2010 6:32 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] dumb ROCKS 5.3 w/ SGE roll question (qrsh not working)

Am 19.08.2010 um 13:23 schrieb laotsao:

> u4 in rocks 5.3
> by default rsh-server is not install on compute node or frontend

Aha. Then it is sufficient to install the "rshd" on all nodes (this is used by `qlogin` by default) and add the frontend to /etc/hosts.equiv. But it's not necessary to have the `rshd` running all the time, in fact it can be disabled in /etc/xinetd.d/rshd ("disable yes"). OGE will start a unique one for each login on a random port still.

Or -builtin- can be used a communication method inside OGE.

-- Reuti


> rocks use sshd for most command
> regards
> 
> 
> On 8/19/2010 5:37 AM, reuti wrote:
>> Hi,
>> 
>> Am 19.08.2010 um 04:34 schrieb craffi:
>> 
>>> For the first time in a long while I'm working on a cluster built using
>>> the ROCKS kit.
>>> 
>>> It's the latest ROCKS 5.3 with the SGE roll
>>> 
>>> In the standard install, the SGE qrsh command ("qrsh hostname") fails
>>> like this:
>>> 
>>>> error: error: ending connection before all data received
>>>> error:
>>>> error reading job context from "qlogin_starter"
>>> 
>>> Just wondering if this is something that ROCKS people are long familiar
>>> with or not. qlogin and qsub work fine and as expected.
>> which version of OGE, and what is the setting for "rsh_daemon/command"? When this `qrsh<command>`is not working, also parallel jobs will fail in a tight integration I think. Or is the problem only from the headnode to the exechosts, as parallel jobs have communication only between the exechosts usually?
>> 
>> -- Reuti
>> 
>> 
>>> -Chris
>>> 
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=275295
>>> 
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=275371
>> 
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=275411
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].<laotsao.vcf>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=275414

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=275467

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list