[GE users] occasional job failure - can't find user's home directory

reuti reuti at staff.uni-marburg.de
Thu Oct 28 09:39:29 BST 2010


Am 27.10.2010 um 20:22 schrieb cjf001:

> Ah - to avoid the dreaded "stale NFS handle" error, I guess ?

Correct -- Reuti


> That's
> a good idea - although in my particular case, we use NetApp fileservers,
> to their export implementation is a bit a different - I'll have to
> see if they support the "fsid" option.
> 
>    Thanks,
> 
>      John
> 
> 
> reuti wrote:
>> Am 27.10.2010 um 19:59 schrieb cjf001:
>> 
>>> yes, that's probably my case, too, although I have only 2 NFS
>>> servers that hold all the home directories. I think to "turn off"
>>> the automounter, I'd just have to "hard-mount" the /users volume
>>> on each of the NFS servers from all the compute nodes - so we're
>>> talking 2 mounts, instead of hundreds - probably very doable.
>>> However, the user volumes on each of the NFS servers are named
>>> the same, and the automounter currently chooses between them,
>>> so that little issue would have to be handled with some name
>>> changes or something.
>> 
>> Yep. Just to note, that there is a feature in SGE to handle different mount points: sge_aliases (`man sge_aliases`).
>> 
>> The usual thing I add is a fixed file system id in the /etc/exports file to allow a reboot of the fileserver:
>> 
>> /usr/sge        @nodes(fsid=1001,rw,root_squash,sync,subtree_check)
>> /home           @nodes(fsid=1002,rw,no_root_squash,sync,no_subtree_check)
>> 
>> -- Reuti
>> 
>> 
>>> Thanks to all for their replies ! I'll have to decide whether
>>> or not it's worth it to remove a 0.1% failure rate....
>>> 
>>>    John
>>> 
>>> 
>>> Adam Tygart wrote:
>>>> In all honesty, there are good reasons to use the automounter on a
>>>> Beowulf cluster.
>>>> For example, my file server is running OpenSolaris, each of my users'
>>>> home directories falls on a different zfs "filesystem." (The reason
>>>> for this is simpler snapshots and better quota management).
>>>> Under your solution I would have to mount each of my 275 home
>>>> directories on each of my 125 nodes on the off chance that a
>>>> particular user is going to need access to those files on that node.
>>>> This would create an unnecessary strain on my file server. Using
>>>> automounter, only the users that are logged have their home
>>>> directories mounted.
>>>> 
>>>> --
>>>> Adam
>>>> 
>>>> On Wed, Oct 27, 2010 at 17:31, jlforrest<jlforrest at berkeley.edu>   wrote:
>>>>> On 10/27/2010 10:19 AM, John Foley wrote:
>>>>>> Jon -
>>>>>> 
>>>>>> thanks for the reply - that's very interesting info. One
>>>>>> question, though - when you say you "turned off" the
>>>>>> automounter, I'm assuming that you also had to hard-mount
>>>>>> the user's home directory areas, and any other NFS
>>>>>> areas that the simulations require ?
>>>>> 
>>>>> That's right. If the private network in a Beowulf
>>>>> cluster has the kind of problems that the automounter
>>>>> solves, then you have a sick cluster.
>>>>> 
>>>>> I believe that for a standard Beowulf cluster an automounter
>>>>> isn't necessary on the compute nodes. In environments
>>>>> where compute nodes mount from "non-standard" file servers
>>>>> or have other unusual features this might not be true.
>>>>> But, in my Rocks cluster, I've had absolutely no problems
>>>>> due to turning off the automounter.
>>>>> 
>>>>> Jon
>>>>> 
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290496
>>>>> 
>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>>> 
>>> 
>>> 
>>> 
>>> --
>>> ###########################################################################
>>> # John Foley                          # Location:  IL93-E1-21S            #
>>> # IT&  Systems Administration         # Maildrop:  IL93-E1-35O            #
>>> # LV Simulation Cluster Support       #    Email: john.foley at motorola.com #
>>> # Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
>>> # 600 North US Highway 45             #      Fax: (847) 523-5767          #
>>> # Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
>>> ###########################################################################
>>>               (this email sent using SeaMonkey on Windows)
>>> 
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290516
>>> 
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290521
>> 
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> 
> 
> -- 
> ###########################################################################
> # John Foley                          # Location:  IL93-E1-21S            #
> # IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
> # LV Simulation Cluster Support       #    Email: john.foley at motorola.com #
> # Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
> # 600 North US Highway 45             #      Fax: (847) 523-5767          #
> # Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
> ###########################################################################
>               (this email sent using SeaMonkey on Windows)
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290528
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=290750

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list