[GE users] yet another commlib error

Christian Reissmann Christian.Reissmann at Sun.COM
Tue Dec 6 10:43:54 GMT 2005


Hello Michael,

This seems not to be a "commlib error", more a "qmaster doesn't startup
error".

Is qmaster running after starting up?

Your configuration may be corrupt. Did you shutdown the qmaster before
removing filesystems ?


Regards,

Christian


Michael Green wrote:
>SLES9SP1
>N1GE 6U6
>1 master <-NAT-> 8 nodes
>$SGE_ROOT=/srv/N1GE on physical shared file system (GPFS) on IBM FASTt700 SAN.
>
>Yesterday I had IBM staff over here servicing the storage. I cleanly
>unmounted file systems and shut down all machines before they put
>their hands on it.
>
>After they finished I booted the systems, everything went without
>hitch except SGE refused to start.
>
>On the master:
><code>
>gene1:/srv/N1GE/default/spool/qmaster # /etc/init.d/sgemaster start
>   starting sge_qmaster
>
>sge_qmaster didn't start!
>Please check the messages file
>
>   starting sge_schedd
>error: commlib error: can't connect to service (Connection refused)
>error: getting configuration: unable to contact qmaster using port 536
>on host "gene1.weizmann.ac.il"
>error: can't get configuration from qmaster -- backgrounding
></code>
>
><log>
>gene1:/srv/N1GE/default/spool/qmaster # tail -f messages
>12/06/2005 10:24:54|qmaster|gene1|E|missing configuration attribute "hostname"
>12/06/2005 10:24:54|qmaster|gene1|E|cannot recreate queue all.q from
>disk because of unknown host g1.biocl.weizmann.ac.il
>12/06/2005 10:24:54|qmaster|gene1|I|read job database with 1 entries
>in 0 seconds
>12/06/2005 10:24:54|qmaster|gene1|E|cqueue_list_locate_qinstance("all.q at g3.biocl.weizmann.ac.il"):
>cqueue == NULL("all.q", "g3.biocl.weizmann.ac.il", 1, 0)
>12/06/2005 10:24:54|qmaster|gene1|E|can't find queue
>"all.q at g3.biocl.weizmann.ac.il" referenced in job 27
></log>
>
>qmaster complains on missing hostname attribute, but what is the file
>that contains it? grepping on default/ directory reveals quite a few
>files containing 'hostname'.
>Also the line with 'cqueue_list_locate_qinstance', does it check the
>cqueues/all.q file?
>
>Please help!
>--
>Warm regards,
>Michael Green
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list