[GE users] Recovering from a failover

adary adary at marvell.com
Tue Aug 18 07:19:56 BST 2009


By default the machine that has a qmaster process is the master.

If you want to fall back to the "original" qmaster you need to create a file called primary_qmaster in your $SGE_ROOT/$SGE_CELL/common directory and it should have one line of data - hostname of your designated primary qmaster host.

This way when the failed primary qmaster recovers from failure (reboots or whatnot) it will see that it should be the qmaster, and it will send the shutdown signal to the current acting qmaster host, and start its own qmaster daemon.



-----Original Message-----
From: isakrejda [mailto:isakrejda at lbl.gov]
Sent: Tuesday, August 18, 2009 9:08 AM
To: users at gridengine.sunsource.net
Subject: [GE users] Recovering from a failover

Hi,

I have an sge_shadowd running and it properly starts a master
on the backup node when the primary server crashes.
But then what is the best way to get the master running properly
on the primary server once it gets fixed and ready to take
its duties back? I should mention I am running sge 6.2u2.

Thank you...

iwona

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=212788

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=212791

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list