[GE users] effect of automounting home folders in SGE environment under largish qmake loads?

Dave Love d.love at liverpool.ac.uk
Thu Oct 16 17:09:37 BST 2008


Chris Dagdigian <dag at sonsorol.org> writes:

> I'm not an automount user myself

The first question might be which automounter, with what configuration?

> so I wanted to run this by the list
> -- it feels "right" to me that a heavy workload making use of heavy
> qmake (aka 'qrsh') calls is going to put some stress on automount as
> the folders get mounted (and presumably unmounted) as tasks are
> scattered across nodes. And any automount delays or failures with a
> home folder would mean that the SSH keys would not be accessible and
> that would cause the login/authentication issues I've been seeing.

I wouldn't expect it to do that on the basis of long experience with amd
unless the mount really times out somehow, waiting on the server.
Anyhow, you surely want to look in syslog for auth and NFS messages.
What you see from failed sshs will depend on the OS and how ssh is
configured, I think (e.g. using PAM or not).  I doubt this is anything
really SGE-related, though.

> Is that a valid guess or am I grasping at straws here? I'm going to
> recommend that automount be replaced with a static mount of /home
> before we try to reproduce the error.

Although I generally like automounting, it does seem best just to hard
mount on the compute nodes.  Then if the fileserver goes away, jobs just
hang (in the absence of any other timeouts) until it comes back.  I've
had that work OK plenty of times in an SGE environment.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list