[GE users] occasional job failure - can't find user's home directory

cjf001 john.foley at motorola.com
Wed Oct 27 19:22:53 BST 2010


Ah - to avoid the dreaded "stale NFS handle" error, I guess ? That's
a good idea - although in my particular case, we use NetApp fileservers,
to their export implementation is a bit a different - I'll have to
see if they support the "fsid" option.

    Thanks,

      John


reuti wrote:
> Am 27.10.2010 um 19:59 schrieb cjf001:
>
>> yes, that's probably my case, too, although I have only 2 NFS
>> servers that hold all the home directories. I think to "turn off"
>> the automounter, I'd just have to "hard-mount" the /users volume
>> on each of the NFS servers from all the compute nodes - so we're
>> talking 2 mounts, instead of hundreds - probably very doable.
>> However, the user volumes on each of the NFS servers are named
>> the same, and the automounter currently chooses between them,
>> so that little issue would have to be handled with some name
>> changes or something.
>
> Yep. Just to note, that there is a feature in SGE to handle different mount points: sge_aliases (`man sge_aliases`).
>
> The usual thing I add is a fixed file system id in the /etc/exports file to allow a reboot of the fileserver:
>
> /usr/sge        @nodes(fsid=1001,rw,root_squash,sync,subtree_check)
> /home           @nodes(fsid=1002,rw,no_root_squash,sync,no_subtree_check)
>
> -- Reuti
>
>
>> Thanks to all for their replies ! I'll have to decide whether
>> or not it's worth it to remove a 0.1% failure rate....
>>
>>     John
>>
>>
>> Adam Tygart wrote:
>>> In all honesty, there are good reasons to use the automounter on a
>>> Beowulf cluster.
>>> For example, my file server is running OpenSolaris, each of my users'
>>> home directories falls on a different zfs "filesystem." (The reason
>>> for this is simpler snapshots and better quota management).
>>> Under your solution I would have to mount each of my 275 home
>>> directories on each of my 125 nodes on the off chance that a
>>> particular user is going to need access to those files on that node.
>>> This would create an unnecessary strain on my file server. Using
>>> automounter, only the users that are logged have their home
>>> directories mounted.
>>>
>>> --
>>> Adam
>>>
>>> On Wed, Oct 27, 2010 at 17:31, jlforrest<jlforrest at berkeley.edu>   wrote:
>>>> On 10/27/2010 10:19 AM, John Foley wrote:
>>>>> Jon -
>>>>>
>>>>> thanks for the reply - that's very interesting info. One
>>>>> question, though - when you say you "turned off" the
>>>>> automounter, I'm assuming that you also had to hard-mount
>>>>> the user's home directory areas, and any other NFS
>>>>> areas that the simulations require ?
>>>>
>>>> That's right. If the private network in a Beowulf
>>>> cluster has the kind of problems that the automounter
>>>> solves, then you have a sick cluster.
>>>>
>>>> I believe that for a standard Beowulf cluster an automounter
>>>> isn't necessary on the compute nodes. In environments
>>>> where compute nodes mount from "non-standard" file servers
>>>> or have other unusual features this might not be true.
>>>> But, in my Rocks cluster, I've had absolutely no problems
>>>> due to turning off the automounter.
>>>>
>>>> Jon
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290496
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>
>>
>>
>>
>> --
>> ###########################################################################
>> # John Foley                          # Location:  IL93-E1-21S            #
>> # IT&  Systems Administration         # Maildrop:  IL93-E1-35O            #
>> # LV Simulation Cluster Support       #    Email: john.foley at motorola.com #
>> # Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
>> # 600 North US Highway 45             #      Fax: (847) 523-5767          #
>> # Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
>> ###########################################################################
>>                (this email sent using SeaMonkey on Windows)
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290516
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290521
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



-- 
###########################################################################
# John Foley                          # Location:  IL93-E1-21S            #
# IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
# LV Simulation Cluster Support       #    Email: john.foley at motorola.com #
# Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
# 600 North US Highway 45             #      Fax: (847) 523-5767          #
# Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
###########################################################################
               (this email sent using SeaMonkey on Windows)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290528

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list