[GE users] SSH connection refused - intermittent

giftedplacebo aeverett at forteds.com
Mon Mar 22 14:26:49 GMT 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

All machines are running sge_execd as user sgeadmin (an account we created for running all our sge_ processes).

The problem occurs on all machines.

Since my original email, I bumped ConnectionAttempts from 20 to 50, and the errors have gone away, but I believe it is only masking the problem by allowing ssh more attempts to try to connect. I'd like to solve the underlying problem.

Best regards,
Aaron


On Sat, Mar 20, 2010 at 6:30 PM, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
Hi,

Am 15.03.2010 um 16:01 schrieb giftedplacebo:

> We have been running sge for several years now, currently running
> 6.0u10. We recently started seeing lots of ssh failures (5-9%) like
> the following:
>
> ssh: connect to host grid057.<mydomain>.com port 45364: Connection
> refused

is the execd running on some machines not as root? Or is this
happening on all machines in the cluster and not only certain ones?

-- Reuti


> grid057 appears to accept the connection, this is the corresponding /
> var/log/messages entry:
>
> Mar  9 06:36:15 grid057 sshd[14936]: Accepted publickey for
> <username> from 172.16.14.157 port 45364 ssh2
>
> (<mydomain> and <username> have been removed for privacy.)
>
> On all grid nodes I have selinux and iptables disabled.
>
> sshd is running with the following /etc/ssh/sshd_config
>
> X11Forwarding yes
> PrintMotd no
> MaxStartups 10000:1:10000
> Subsystem       sftp    /usr/libexec/openssh/sftp-server
>
> /etc/ssh/ssh_config on all nodes is:
>
> Host *
>    RhostsRSAAuthentication yes
>    StrictHostKeyChecking no
>    ConnectionAttempts 20
>
> I have also set the following to 3000:
>
> /proc/sys/net/core/netdev_max_backlog
> /proc/sys/net/core/somaxconn
>
> The problem is across all machines, and only affects ~5-9% of ssh
> connections. I don't see any error messages on the machines, just
> the ssh failure notice in our job log files. Does anyone have ideas
> or tips on tuning ssh/sshd? Thanks!
>
> Best regards,
> Aaron
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250055

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




More information about the gridengine-users mailing list