[GE users] Qmaster dies on startup: !!!!!!!!!! got NULL element for QU_rerun !!!!!!!!!!

jesperkrogh jesper at krogh.cc
Thu Jan 21 06:49:06 GMT 2010


Hi List.

I have a Gridengine installation that has been running for over
2.500.000 jobs and 18 months without any problem. But last night the
qmaster stopped working giving these errors in the log file when I try
to start it up again:

01/21/2010 07:20:43|qmaster|ko|I|read job database with 8258 entries in
1 seconds
01/21/2010 07:20:44|qmaster|ko|I|qmaster hard descriptor limit is set to
8192
01/21/2010 07:20:44|qmaster|ko|I|qmaster soft descriptor limit is set to
8192
01/21/2010 07:20:44|qmaster|ko|I|qmaster will use max. 8172 file
descriptors for communication
01/21/2010 07:20:44|qmaster|ko|I|qmaster will accept max. 99 dynamic
event clients
01/21/2010 07:20:44|qmaster|ko|I|starting up GE 6.1u4 (lx24-amd64)
01/21/2010 07:20:44|qmaster|ko|E|commlib error: got read error (closing
"host12.internal/qstat/3")
01/21/2010 07:20:54|qmaster|ko|E|JEXITING report for job 2899864.1:
which is in status 0
01/21/2010 07:20:54|qmaster|ko|E|JEXITING report for job 2899991.1:
which is in status 0
01/21/2010 07:20:54|qmaster|ko|E|JEXITING report for job 2899369.1:
which is in status 0
01/21/2010
07:20:54|qmaster|ko|E|cqueue_list_locate_qinstance("(null)@(null)"):
cqueue == NULL("(null)", "(null)", 1, 0
01/21/2010 07:20:54|qmaster|ko|E|writing job finish information: can't
locate queue "(null)@(null)"
01/21/2010 07:20:54|qmaster|ko|W|job 2900040.1 failed on host <unknown
host> before writing exit_status because: shepherd exited with exit
status 19
01/21/2010 07:20:54|qmaster|ko|C|!!!!!!!!!! got NULL element for
QU_rerun !!!!!!!!!!

I think I'm using Berkeley DB spooling and it is in the DB-file "grep"
can find anything about job 2900040 (I have erased the job files).

SGE: 6.1u4

Should I just re-install everything and go on from here? Or is there
other ways around?

-- 
Jesper

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=240108

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list