[GE users] Restarting sge_execd on all nodes

reuti reuti at staff.uni-marburg.de
Wed Mar 4 09:49:00 GMT 2009

Am 03.03.2009 um 23:33 schrieb paulu:

> On Tuesday 03 March 2009, reuti wrote:
>> Hi Paul,
>> Am 03.03.2009 um 00:12 schrieb paulu:
>>> This weekend, by a fileserver failure, the queue master went down
>>> together with the sge_execd daemons on all nodes.
>>> Everything is working again. Restarting all sge_execd daemons was
>>> done by logging on remotely on each node and starting the daemon
>>> from the commandline manually.
>>> Is there some smarter way to do that, for example analogous to
>>> the 'qconf -ke all' command? I quess it is a bit of a catch 22
>>> situation, because there's no daemon to talk to yet.
>>> Of course I could do some scripting, using 'qselect -qs u' to
>>> iterate over all unavailable nodes, but perhaps there is a more
>>> elegant way.
>>> Any suggestion would be welcome.
>> how did you install SGE? Usually there will be links and scripts
>> installed in e.g. /etc/init.d and the rc3.5 and rc5.d
>> subdirectories to start it automatically during boot (under Linux
>> the location depends on the distribution). Only pitfall is, that
>> they might start too early and other necessary system daemons are
>> not up at that time.
>> I usually move the startup of SGE be the last during startup and
>> the first during shutdown, i.e. entries like "S99sgemaster.p6444 ->
>> ../ sgemaster.p6444".
> Reuti,
> The problem is not that the sge_execd daemons do not start on boot.
> All nodes where still up and running, only the sge_execd daemon on
> each node had died due to a (NFS) fileserver failure. On that
> fileserver SGE is installed. Also the spool directories and so on are
> on that fileserver.


then I would suggest to make at least the spool directories local,  
then this shouldn't happen. The sgeexecd should survive then. It was  
just on the list:


Anyway: was it a hard or soft mount of the NFS?

-- Reuti

> So I just wanted to know if there was an elegant way to start the
> sge_execd daemons again on all nodes with a single command.
> Somebody else suggested using pdsh, which I quite like. So that's the
> path I will follow, I guess.
> Thanks.
> Paul.
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=119883
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list