[GE users] Qmaster seg faulting after 6.1u2 updgrade

david zanella zanella at mayo.edu
Mon Aug 27 16:46:26 BST 2007


I've had something similar happen twice. 

First time I lost a queue, tried applying patches and qmaster refused to start. 
When it came up, it had lost most of the queues and hostgroups. 

It happened again where the qmaster crashed one weekend and refused to start. 
Tracked it down to (at least) one old/defunct host in the config. I went into
the qmaster config files and manually removed all references to old/defunt 
hosts. Kept restarting qmaster and watching the messages file (look closely). 
Kept cleaning things up manually until it eventually started. I seem to remember 
there were some discrepencies between hostnames and FQDN's that it didn't like. 

Used a combination of find and rgrep to ferret out old/bad stuff. 

Not for the faint of heart...and make sure you save a backup of all the config 
files somewhere...


> I just installed the 6.1u2 patch to our SGE 6.1 installation, and SGE will
> not start up:
> 
> [root at bhmnode2 n1ge6]# default/common/sgemaster
>    starting sge_qmaster
> 
> sge_qmaster didn't start!
> Please check the messages file
> 
>    starting sge_schedd
> error: commlib error: can't connect to service (Connection refused)
> error: getting configuration: unable to contact qmaster using port 6444 on
> host "bhmnode2"
> error: can't get configuration from qmaster -- backgrounding
> [root at bhmnode2 n1ge6]#
> 
> 
> There is nothing in .../spool/qmaster/messages file after the messgaes about
> shutting down 6.1 prior to the updgrade.
> 
> However, /var/log/messages sas sge_qmaster is seg faulting:
> 
> 
> Aug 27 10:18:50 bhmnode2 kernel: sge_qmaster[14591]: segfault at
> 0000038700000384 rip 00000039fa471d23 rsp 0000007fbfffd520 error 4
> 
> 
> I tried restoring from backup, and the backup also gives the same seg
> faulting behavior now!
> 
> Any ideas/help gretaly appreciated (ASAP).
> 
> Thanks,
> 
> Todd
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list