[GE users] Shadow master

heywood heywood at cshl.edu
Thu May 6 19:39:50 BST 2010

It used to be that installing the shadow master just involved putting the
hostname of the machine that will run the shadow master in
/opt/sge/default/common/shadow_masters, and then starting the shadow master
on that node with "/opt/sge/default/common/sgemaster -shadowd". A few SGE
versions ago I tested failover and it was fine. sge_qmaster runs on the main
head node and sge_shadowd runs on the shadow/spare head node.

Earlier this week the main head node was rebooted, and it appeared that
failover worked since the act_qmaster file was updated to hold the
shadow/spare node name. But SGE commands got the error that they couldn't
find the connection for the qmaster port. So did qping.

So I looked at the 6.2u5 docs, and they now say to "install" the shadow
master with "./inst_sge -sm". OK, maybe something changed since the shadow
master failover worked for us in an earlier version. But trying that, I get:

Creating local configuration
value == NULL for attribute "mailer" in configuration list of "bhmnode1"

./util/install_modules/inst_common.sh: line 261: Translate: command not

./util/install_modules/inst_common.sh: line 263: Translate: command not
./util/install_modules/inst_common.sh: line 264: Translate: command not

So... How do I get shadow master failover working again?



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list