[GE users] ssh key problems

Schenker, Martin MSchenker at illumina.com
Fri Mar 9 14:00:33 GMT 2007

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Thanks! We actually had three problems:
-wrong file owner for the new nodes in the spool dir
-two boxes out of 6 needed to be added with the FQDN
-file permissions on the authorized_keys wasn't set to 600 for all nodes.
Now it works for all 16 nodes!
Thanks for the input, sometimes you can't see the obvious...
Best, Martin

-----Original Message-----
From: GARDAIS Ionel [mailto:Ionel.Gardais at tech-advantage.com]
Sent: 08 March 2007 16:03
To: users at gridengine.sunsource.net; users at gridengine.sunsource.net
Subject: RE : [GE users] ssh key problems

Hi there,

Have you tried to ssh on the nodes with the FQDN machine name ?
I had to do this for my setup (so both machine and machine.domain are listed in the known_hosts file)


-------- Message d'origine--------
De: Schenker, Martin [ mailto:MSchenker at illumina.com]
Date: jeu. 08/03/2007 16:47
?: users at gridengine.sunsource.net
Objet : [GE users] ssh key problems

Hi all!

I've got a very strange problem here. It might be a very basic thing, but two people here are quite stumped by this.

We've got 16 nodes running on 6u8 (upgrade on 6u10 is planned but not yet done). node1-10 were set up by someone else, who left now. They are running fine.

Now nodes 11-16 have been delivered and need to be intergrated into the single queue we're running. I followed the instructions, created ssh and rss keys for all the new nodes, ran the install_execd script on each of those. I can see the nodes via qstat and qmon, they seem to be fine. But any job submitted to the new nodes fails with a "host key verification error".

Manually I can ssh without a password prompt between the nodes (as the same user GE is running).

We're running "Using ssh with qrsh and qlogin" http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html

from qconf -mconf

rlogin_daemon                /usr/sbin/sshd -i
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 false
rsh_daemon                   /usr/sbin/sshd -i
rsh_command                  /usr/bin/ssh
rlogin_command               /usr/bin/ssh

Is there something I can test? I'm pretty sure it's a stupid simple thing we're overlooking...

Cheers for any pointers!


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list