[GE users] Restarting sge_execd on all nodes

paulu pcu-m at xs4all.nl
Tue Mar 3 22:33:30 GMT 2009


On Tuesday 03 March 2009, reuti wrote:
> Hi Paul,
>
> Am 03.03.2009 um 00:12 schrieb paulu:
> > This weekend, by a fileserver failure, the queue master went down
> > together with the sge_execd daemons on all nodes.
> >
> > Everything is working again. Restarting all sge_execd daemons was
> > done by logging on remotely on each node and starting the daemon
> > from the commandline manually.
> >
> > Is there some smarter way to do that, for example analogous to
> > the 'qconf -ke all' command? I quess it is a bit of a catch 22
> > situation, because there's no daemon to talk to yet.
> >
> > Of course I could do some scripting, using 'qselect -qs u' to
> > iterate over all unavailable nodes, but perhaps there is a more
> > elegant way.
> >
> > Any suggestion would be welcome.
>
> how did you install SGE? Usually there will be links and scripts
> installed in e.g. /etc/init.d and the rc3.5 and rc5.d
> subdirectories to start it automatically during boot (under Linux
> the location depends on the distribution). Only pitfall is, that
> they might start too early and other necessary system daemons are
> not up at that time.
>
> I usually move the startup of SGE be the last during startup and
> the first during shutdown, i.e. entries like "S99sgemaster.p6444 ->
> ../ sgemaster.p6444".

Reuti,

The problem is not that the sge_execd daemons do not start on boot. 

All nodes where still up and running, only the sge_execd daemon on 
each node had died due to a (NFS) fileserver failure. On that 
fileserver SGE is installed. Also the spool directories and so on are 
on that fileserver.

So I just wanted to know if there was an elegant way to start the 
sge_execd daemons again on all nodes with a single command.

Somebody else suggested using pdsh, which I quite like. So that's the 
path I will follow, I guess.

Thanks.

Paul.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=119883

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list