[GE users] Upgrading from 6.1u4 to 6.2

Aaron Turner aaron at cs.york.ac.uk
Tue Dec 2 10:17:13 GMT 2008


petrik wrote:
> This does not refer to /var/log/messages, but QMASTER_SPOOL_DIR/messages 
> file. It means that the upgrade finished, but the qmaster could not 
> start. What does the file say?

This was the first file I checked, but has nothing at all relating to 
the failure to start the master, unfortunately.

However Reuti's hint about looking in /tmp turns up a file called 
schedd_messages which contains the information (I've replaced the hostname
of the master below with sgem and sgemFQDN to stop lines breaking up as 
much)

12/01/2008 18:06:22|schedd|sgem|E|commlib error: endpoint is not unique 
error (endpoint "sgemFSQN/schedd/1" is already connected)
12/01/2008 18:06:38|schedd|sgem|E|commlib error: endpoint is not unique 
error (endpoint "sgemFQDN/schedd/1" is already connected)
12/01/2008 18:07:58|schedd|sgem|C|scheduler already running
12/01/2008 18:08:14|schedd|sgem|C|scheduler already running

Although this refers to the scheduler rather than the master.

And also sge_messages

Containing

12/01/2008 18:31:14|  main|sgem|E|communication error for "sgemFQDN/q
master/1" running on port 536: "can't bind socket"
12/01/2008 18:31:15|  main|sgem|E|commlib error: can't bind socket (no 
additional information available)
12/01/2008 18:31:43|  main|sgem|C|abort qmaster startup due to 
communication errors

The /etc/services entry for the qmaster is unchanged from the 6.1 
installation as:

sge_qmaster     536/tcp

telnet to this port allows a connection.

Currently I'm stumped!

Regards,

   Aaron Turner

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=90701

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list