[GE users] SSH connection refused - intermittent

giftedplacebo aeverett at forteds.com
Mon Mar 15 15:01:06 GMT 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi all,

We have been running sge for several years now, currently running 6.0u10. We recently started seeing lots of ssh failures (5-9%) like the following:

ssh: connect to host grid057.<mydomain>.com<http://grid057.forteds.com/> port 45364: Connection refused

grid057 appears to accept the connection, this is the corresponding /var/log/messages entry:

Mar  9 06:36:15 grid057 sshd[14936]: Accepted publickey for <username> from 172.16.14.157 port 45364 ssh2

(<mydomain> and <username> have been removed for privacy.)

On all grid nodes I have selinux and iptables disabled.

sshd is running with the following /etc/ssh/sshd_config

X11Forwarding yes
PrintMotd no
MaxStartups 10000:1:10000
Subsystem       sftp    /usr/libexec/openssh/sftp-server

/etc/ssh/ssh_config on all nodes is:

Host *
   RhostsRSAAuthentication yes
   StrictHostKeyChecking no
   ConnectionAttempts 20

I have also set the following to 3000:

/proc/sys/net/core/netdev_max_backlog
/proc/sys/net/core/somaxconn

The problem is across all machines, and only affects ~5-9% of ssh connections. I don't see any error messages on the machines, just the ssh failure notice in our job log files. Does anyone have ideas or tips on tuning ssh/sshd? Thanks!

Best regards,
Aaron




More information about the gridengine-users mailing list