[GE users] sge_qmaster start problem

gclark geoff.clark at mdacorporation.com
Wed Jan 6 17:29:58 GMT 2010


Ok... I don't quite understand this.

There were no qmaster or other sge named files that I could see under /tmp.

I then looked in spool/qmaster/messages. I was seeing the following every time I attempted to start SGE:
01/06/2010 12:16:41|  main|xcat02|C|job file "jobs/00/0018/6190" has zero size
01/06/2010 12:16:41|  main|xcat02|C|job file "jobs/00/0018/6194/common" has zero size
01/06/2010 12:16:41|  main|xcat02|C|job file "jobs/00/0018/6205" has zero size
01/06/2010 12:16:41|  main|xcat02|E|wrong cull version, read 0x00000000, but expected actual version 0x10020000
01/06/2010 12:16:41|  main|xcat02|E|error in init_packbuffer: wrong cull version
01/06/2010 12:16:41|  main|xcat02|E|job file "186235" has wrong file name - deleting
01/06/2010 12:16:41|  main|xcat02|C|!!!!!!!!!! got NULL element for JB_pe !!!!!!!!!!

I decided to remove all of the entries under 00/0018. After removing all of these jobs I was able to start SGE successfully again. I'm not sure what would have caused the issues mentioned above, and I don't know which of them would have caused SGE to fail to start.

Any ideas which of these issues would have kept SGE from starting? Any ideas what may have caused the issues to begin with?

Thanks,
Geoff

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=236872

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list