[GE users] Help configuring grid to use ssh instead of rsh

Reuti reuti at staff.uni-marburg.de
Fri Apr 8 00:56:06 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Okay, I just saw Rayson's reply.

Anyway to the network traffic: is $SGE_ROOT also shared or local to each node - 
this will let SGE at least out of the NFS traffic (e.g. the SGE spool directory 
on each node in /var/spool/sge as I like it - there is an Howto to reduce NFS 
traffic).

CU - Reuti


Quoting Gary Thomas <gthomas at ForteDS.com>:

> The jobs generate heavy network traffic because they are typically
> compiling sources off of a network file server.  Each machine has 1
> network card.
> 
> Ssh is setup for passwordless login and SGE 5.3p4 is configured as
> described for ssh.
> 
> We want to increase the number of nodes to 100+ in the near future and
> I'm concerned that we might run into more serious problems.
> 
> GT
> 
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Thursday, April 07, 2005 3:34 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Help configuring grid to use ssh instead of rsh
> 
> Hi,
> 
> Quoting Gary Thomas <gthomas at ForteDS.com>:
> 
> > Hi, We have a grid setup with 30+ machines, and we've been having a
> lot
> > of problems
> > 
> > Lately with "poll" failures and "unable to read return code" failures.
> > I'm trying to switch
> 
> with such an amount of nodes there shouldn't be any problem like this.
> Are your 
> jobs generating heavy network traffic? One or two network cards in each 
> machine?
> 
> > 
> > Over to using ssh to see if it has fewer problems, but I cant seem to
> > get it to work consistently.
> > 
> 
> You setup a passwordless login for ssh and configured SGE according to
> the 
> Howto at sunsource.net? Which platform and SGE version? Classic-spooling
> (to 
> NFS/local) or BDB?
> 
> Cheers - Reuti
> 
> >  
> > 
> > I keep getting intermittent errors like this:
> > 
> >  
> > 
> > testing rdgrid08.q
> > 
> > ssh: connect to address 172.16.2.48 port 38916: Connection refused
> > 
> >  
> > 
> > If I ^C at this point I get:
> > 
> >  
> > 
> > error: error waiting on socket for client to connect: Interrupted
> system
> > call
> > 
> > error: error reading returncode of remote command
> > 
> >  
> > 
> > Is anyone else using ssh, or are the some settings we can tweek for
> rsh
> > to avoid the "poll" and
> > 
> > "unable to read return code" errors?
> > 
> >  
> > 
> > Thanks,
> > 
> >  
> > 
> > GT
> > 
> > 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list