[GE users] Restarting sge_execd on all nodes

reuti reuti at staff.uni-marburg.de
Fri Mar 6 11:45:48 GMT 2009


Hi,

Am 06.03.2009 um 00:29 schrieb paulu:

> On Wednesday 04 March 2009, reuti wrote:
>> Am 03.03.2009 um 23:33 schrieb paulu:
>>>
>>> All nodes where still up and running, only the sge_execd daemon
>>> on each node had died due to a (NFS) fileserver failure. On that
>>> fileserver SGE is installed. Also the spool directories and so on
>>> are on that fileserver.
>>
>> Hi,
>>
>> then I would suggest to make at least the spool directories local,
>> then this shouldn't happen. The sgeexecd should survive then. It
>> was just on the list:
>>
>> http://gridengine.sunsource.net/howto/nfsreduce.html
>
> Thanks.
>
>> Anyway: was it a hard or soft mount of the NFS?
>
> It is a hard mount. Does that make sge_exed more susceptible for file
> server failure, or less?

a hard mount is fine. Normally the OS should just wait, until the  
hard mount becomes available again. Was the file server rebooted? For  
this to survive, you might need to supply "file system id"s in /etc/ 
exports like:

/usr/sge                        @nodes 
(rw,root_squash,anonuid=63,anongid=58,sync,fsid=1004,subtree_check)

-- Reuti


> Paul.
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=121646
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=122145

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list