[GE users] shadow master problems

murple andreas.kuntzagk at mdc-berlin.de
Fri Jul 16 13:50:17 BST 2010


today we tested shadow master functionality on our cluster. After making 
$SGE_ROOT/$SGE_CELL/spool shared between the two master nodes (login1 and 
login2) stopping the qmaster on login2 (the old master) and removing the lock 
file first seemed to work. login1 noticed the stale heartbeat and started on own 
qmaster. Unfortunately the exec-nodes did not notice that change and still tried 
to contact login2.
Error messages:

main|node001|W|can't register at "qmaster": unable to contact qmaster using port 
6444 on host "login2"

Our setup is 6.2 with shared $SGE_ROOT/$SGE_CELL/common
$SGE_ROOT/$SGE_CELL/spool is only shared between the login/head nodes.

regards, Andreas


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list