[GE users] scheduler goes down

Philipp Drum sge-users at schupppi.de
Thu May 12 11:22:41 BST 2005


* Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> There is nothing in the scheduler or qmaster message file?

05/02/2005 09:56:51|schedd|ludwig|E|commlib error: got connect timeout (connect timeout error)
05/02/2005 09:56:52|schedd|ludwig|E|can't send asynchronous message to commproc (qmaster:1) on host "ludwig": got connect timeout
05/02/2005 09:56:52|schedd|ludwig|I|controlled shutdown 6.0u3
05/02/2005 09:57:07|schedd|ludwig|I|starting up 6.0u3
05/02/2005 16:34:40|schedd|ludwig|E|commlib error: can't connect to service (connect error errno=101)
05/02/2005 16:34:42|schedd|ludwig|E|can't send asynchronous message to commproc (qmaster:1) on host "ludwig": can't connect to service
05/02/2005 16:34:42|schedd|ludwig|I|controlled shutdown 6.0u3
05/02/2005 16:41:54|schedd|ludwig|I|starting up 6.0u3
05/03/2005 13:45:47|schedd|ludwig|I|controlled shutdown 6.0u3
05/04/2005 11:55:49|schedd|ludwig|I|starting up 6.0u3

05/06/2005 09:58:46|schedd|ludwig|I|starting up 6.0u3

05/06/2005 12:56:55|schedd|ludwig|I|starting up 6.0u3
05/06/2005 21:31:38|schedd|ludwig|C|lGetUlong: no such name (1250, RN_min) in descriptor

05/09/2005 09:06:32|schedd|ludwig|I|starting up 6.0u3
05/10/2005 11:34:40|schedd|ludwig|I|controlled shutdown 6.0u3

On the master (as expected):

... |qmaster|ludwig|E|acknowledge timeout after 600 seconds for event client (schedd:1) on host "ludwig"

> You have the problem with both versions?

yes. Thats why I ask for 'known' problems. If there aren't any, I
would assume some issues with our hardware/OS.

> You could run the scheduler in debug mode and see, where it stops.

yes, we are doing this right now.


regards, Philipp

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list