[GE users] occasional job failure - can't find user's home directory

reuti reuti at staff.uni-marburg.de
Wed Oct 27 19:12:03 BST 2010


Am 27.10.2010 um 19:59 schrieb cjf001:

> yes, that's probably my case, too, although I have only 2 NFS
> servers that hold all the home directories. I think to "turn off"
> the automounter, I'd just have to "hard-mount" the /users volume
> on each of the NFS servers from all the compute nodes - so we're
> talking 2 mounts, instead of hundreds - probably very doable.
> However, the user volumes on each of the NFS servers are named
> the same, and the automounter currently chooses between them,
> so that little issue would have to be handled with some name
> changes or something.

Yep. Just to note, that there is a feature in SGE to handle different mount points: sge_aliases (`man sge_aliases`).

The usual thing I add is a fixed file system id in the /etc/exports file to allow a reboot of the fileserver:

/usr/sge        @nodes(fsid=1001,rw,root_squash,sync,subtree_check)
/home           @nodes(fsid=1002,rw,no_root_squash,sync,no_subtree_check)

-- Reuti


> Thanks to all for their replies ! I'll have to decide whether
> or not it's worth it to remove a 0.1% failure rate....
> 
>    John
> 
> 
> Adam Tygart wrote:
>> In all honesty, there are good reasons to use the automounter on a
>> Beowulf cluster.
>> For example, my file server is running OpenSolaris, each of my users'
>> home directories falls on a different zfs "filesystem." (The reason
>> for this is simpler snapshots and better quota management).
>> Under your solution I would have to mount each of my 275 home
>> directories on each of my 125 nodes on the off chance that a
>> particular user is going to need access to those files on that node.
>> This would create an unnecessary strain on my file server. Using
>> automounter, only the users that are logged have their home
>> directories mounted.
>> 
>> --
>> Adam
>> 
>> On Wed, Oct 27, 2010 at 17:31, jlforrest<jlforrest at berkeley.edu>  wrote:
>>> On 10/27/2010 10:19 AM, John Foley wrote:
>>>> Jon -
>>>> 
>>>> thanks for the reply - that's very interesting info. One
>>>> question, though - when you say you "turned off" the
>>>> automounter, I'm assuming that you also had to hard-mount
>>>> the user's home directory areas, and any other NFS
>>>> areas that the simulations require ?
>>> 
>>> That's right. If the private network in a Beowulf
>>> cluster has the kind of problems that the automounter
>>> solves, then you have a sick cluster.
>>> 
>>> I believe that for a standard Beowulf cluster an automounter
>>> isn't necessary on the compute nodes. In environments
>>> where compute nodes mount from "non-standard" file servers
>>> or have other unusual features this might not be true.
>>> But, in my Rocks cluster, I've had absolutely no problems
>>> due to turning off the automounter.
>>> 
>>> Jon
>>> 
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290496
>>> 
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>> 
> 
> 
> 
> -- 
> ###########################################################################
> # John Foley                          # Location:  IL93-E1-21S            #
> # IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
> # LV Simulation Cluster Support       #    Email: john.foley at motorola.com #
> # Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
> # 600 North US Highway 45             #      Fax: (847) 523-5767          #
> # Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
> ###########################################################################
>               (this email sent using SeaMonkey on Windows)
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290516
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290521

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list