[GE users] Odd behavior with act_qmaster - file contents change
Brett W Grant
Brett_W_Grant at raytheon.com
Wed Oct 22 16:27:49 BST 2008
I am running 6.1 on a cluster of macs. All but two of the macs are 10.4
Tiger OS, two are the 10.5.5 Leopard. At 7:16 local this morning, the
file act_qmasters contents changed from the qmaster to one of these
Leopard macs. In the spool/qmaster/messages file at 7:17 there is a
message about a corrupted database detected, and then a DB_RUNRECOVERY
message and then a number of messages where gethostbyname fails.
If I look at the message file in the host that the was found in the
self-modified act_qmaster file, it simply says at 7:20 that it couldn't
connect to service.
There was no longer a sgemaster process running on the original qmaster
This system has been running just fine for over a year, however, I did add
the two leopard clients about 1 month ago, but they have been working fine
I guess that I don't really understand what the act_qmaster file is for. I
didn't see an entry in the Manual section. How could it change by itself?
What should I do to prevent this from happening in the future? Where
else can I look to see what happened? I didn't see anything at all in the
More information about the gridengine-users