[GE users] effect of automounting home folders in SGE environment under largish qmake loads?

Chris Dagdigian dag at sonsorol.org
Wed Oct 15 19:11:18 BST 2008


Hi folks,

Trying to debug a partial application failure where the most obvious  
STDERR looks exactly like what one would expect if passwordless SSH  
hostkeys were missing or messed up:

>> Permission denied, please try again.
>> Permission denied, please try again.
>> Permission denied (publickey,gssapi-with-mic,password).
>> error: error reading returncode of remote command

Of course manually SSH'ing into these nodes works perfectly and all  
the permission/UID/GID stuff looks great. No problem with the SSH key  
files from what I can tell.

The application is the Solexa pipeline which is using "qmake" under  
the hood to shotgun out lots of short and long running tasks.

I just realized that this cluster is automounting individual user home  
folders at login time.

One explanation for random "permission denied" issues that appear SSH  
key related would be if the cluster was under heavy load and automount  
was hammeredh - missing SSH hostkeys on a node would certainly cause  
the errors above if the automount was failing or timing out on some or  
all of the nodes.

I'm not an automount user myself so I wanted to run this by the list  
-- it feels "right" to me that a heavy workload making use of heavy  
qmake (aka 'qrsh') calls is going to put some stress on automount as  
the folders get mounted (and presumably unmounted) as tasks are  
scattered across nodes. And any automount delays or failures with a  
home folder would mean that the SSH keys would not be accessible and  
that would cause the login/authentication issues I've been seeing.

Is that a valid guess or am I grasping at straws here? I'm going to  
recommend that automount be replaced with a static mount of /home  
before we try to reproduce the error.

-Chris









---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list