[GE users] qmaster crashes several times before remaining stable

mhanby mhanby at uab.edu
Mon Jan 18 16:41:23 GMT 2010


Howdy,

We are running GE 6.2u4 on a Rocks 5.3 cluster.

Whenever I reboot the head node (qmaster node), the qmaster process will start and then after several seconds will stop. I have to restart it several times before it will remain running.

I don't see any /tmp/qmaster* files.

Here are the entries from qmaster/messages, any suggestions, or can I increase the verbosity the next time I reboot so that I might glean a bit more info?

01/17/2010 13:27:10|  main|cheaha|I|read job database with 265 entries in 0 seconds
01/17/2010 13:27:10|  main|cheaha|W|removing reference to no longer existing job 61689 of user "jsmith"
01/17/2010 13:27:10|  main|cheaha|W|removing reference to no longer existing job 61775 of user "jsmith"
01/17/2010 13:27:10|  main|cheaha|E|error opening file "/opt/gridengine/default/spool/qmaster/./sharetree" for reading: No such file or directory
01/17/2010 13:27:10|  main|cheaha|I|qmaster hard descriptor limit is set to 8192
01/17/2010 13:27:10|  main|cheaha|I|qmaster soft descriptor limit is set to 8192
01/17/2010 13:27:10|  main|cheaha|I|qmaster will use max. 8172 file descriptors for communication
01/17/2010 13:27:10|  main|cheaha|I|qmaster will accept max. 99 dynamic event clients
01/17/2010 13:27:10|  main|cheaha|I|starting up GE 6.2u4 (lx26-amd64)
01/17/2010 13:29:01|  main|cheaha|I|read job database with 265 entries in 0 seconds
01/17/2010 13:29:01|  main|cheaha|E|error opening file "/opt/gridengine/default/spool/qmaster/./sharetree" for reading: No such file or directory
01/17/2010 13:29:01|  main|cheaha|I|qmaster hard descriptor limit is set to 8192
01/17/2010 13:29:01|  main|cheaha|I|qmaster soft descriptor limit is set to 8192
01/17/2010 13:29:01|  main|cheaha|I|qmaster will use max. 8172 file descriptors for communication
01/17/2010 13:29:01|  main|cheaha|I|qmaster will accept max. 99 dynamic event clients
01/17/2010 13:29:01|  main|cheaha|I|starting up GE 6.2u4 (lx26-amd64)
01/17/2010 13:29:48|  main|cheaha|I|read job database with 265 entries in 1 seconds
01/17/2010 13:29:48|  main|cheaha|E|error opening file "/opt/gridengine/default/spool/qmaster/./sharetree" for reading: No such file or directory
01/17/2010 13:29:48|  main|cheaha|I|qmaster hard descriptor limit is set to 8192
01/17/2010 13:29:48|  main|cheaha|I|qmaster soft descriptor limit is set to 8192
01/17/2010 13:29:48|  main|cheaha|I|qmaster will use max. 8172 file descriptors for communication
01/17/2010 13:29:48|  main|cheaha|I|qmaster will accept max. 99 dynamic event clients
01/17/2010 13:29:48|  main|cheaha|I|starting up GE 6.2u4 (lx26-amd64)
01/17/2010 13:35:30|  main|cheaha|I|read job database with 265 entries in 0 seconds
01/17/2010 13:35:30|  main|cheaha|E|error opening file "/opt/gridengine/default/spool/qmaster/./sharetree" for reading: No such file or directory
01/17/2010 13:35:30|  main|cheaha|I|qmaster hard descriptor limit is set to 8192
01/17/2010 13:35:30|  main|cheaha|I|qmaster soft descriptor limit is set to 8192
01/17/2010 13:35:30|  main|cheaha|I|qmaster will use max. 8172 file descriptors for communication
01/17/2010 13:35:30|  main|cheaha|I|qmaster will accept max. 99 dynamic event clients
01/17/2010 13:35:30|  main|cheaha|I|starting up GE 6.2u4 (lx26-amd64)
01/17/2010 13:39:56|  main|cheaha|I|read job database with 263 entries in 0 seconds
01/17/2010 13:39:56|  main|cheaha|E|error opening file "/opt/gridengine/default/spool/qmaster/./sharetree" for reading: No such file or directory
01/17/2010 13:39:56|  main|cheaha|I|qmaster hard descriptor limit is set to 8192
01/17/2010 13:39:56|  main|cheaha|I|qmaster soft descriptor limit is set to 8192
01/17/2010 13:39:56|  main|cheaha|I|qmaster will use max. 8172 file descriptors for communication
01/17/2010 13:39:56|  main|cheaha|I|qmaster will accept max. 99 dynamic event clients
01/17/2010 13:39:56|  main|cheaha|I|starting up GE 6.2u4 (lx26-amd64)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=239562

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list