[GE users] NULL element for JAT_prio

Jesse Becker beckerjes at mail.nih.gov
Wed Dec 10 16:18:06 GMT 2008


Came into work this morning and discovered that qstat from my 6.2 install was throwing this lovely message:  

   critical error: !!!!!!!!!! got NULL element for JAT_prio !!!!!!!!!!

More specifically, it looks like this:

[saturn ~]# qstat
job-ID  prior   name       user     state  submit/start at     queue                   slots ja-task-ID
---------------------------------------------------------------------------------------------------------
1327918 0.73703 solexa-clu solexa    r     12/10/2008 08:15:46 interactive.q at gcr06n20   4
1327917 0.62500 s081209.15 solexa    S     12/10/2008 08:01:24 high.q at gcr06n20          4
1327920 0.73688 solexa-clu solexa    r     12/10/2008 08:15:56 interactive.q at gcr06n24   4
1327928 1.00000 s081209.16 solexa    r     12/10/2008 09:21:25 high.q at gcr06n08          4
1327953 0.21717 FA081106.1 solexa    r     12/10/2008 09:41:13 low.q at gcr07n12           1
1327967 0.92874 s081209.17 solexa    r     12/10/2008 10:41:06 high.q at gcr06n17          4
critical error: !!!!!!!!!! got NULL element for JAT_prio !!!!!!!!!!
Aborted
[saturn ~]#


Digging a little bit deeper, using 'dl 2', I get this output (trimmed for space):

   756  14264         main <-- job_stdout_job() ../clients/qstat/qstat.c 1134 }
   757  14264         main <-- sge_handle_job() ../clients/common/sge_qstat.c 2491 }
   758  14264         main <-- handle_jobs_queue() ../clients/common/sge_qstat.c 715 }
   759  14264         main <-- qstat_handle_running_jobs() ../clients/common/sge_qstat.c 505 }
   760  14264         main --> sge_log() {
   761  14264         main     ../libs/cull/cull_multitype.c 153 !!!!!!!!!! got NULL element for JAT_prio !!!!!!!!!!
   762  14264         main <-- sge_log() ../libs/uti/sge_log.c 620 }

Using 'dl 4', there's a little bit more:

  1253  19495         main <-- job_stdout_job() ../clients/qstat/qstat.c 1134 }
  1254  19495         main <-- sge_handle_job() ../clients/common/sge_qstat.c 2491 }
  1255  19495         main <-- handle_jobs_queue() ../clients/common/sge_qstat.c 715 }
  1256  19495         main <-- qstat_handle_running_jobs() ../clients/common/sge_qstat.c 505 }
  1257  19495 182894186240 --> sge_set_message_id_output() {
  1258  19495 182894186240 <-- sge_set_message_id_output() ../libs/uti/sge_language.c 498 }
  1259  19495 182894186240 --> sge_gettext_() {
  1260  19495 182894186240 --> sge_get_message_id_output_implementation() {
  1261  19495 182894186240 <-- sge_get_message_id_output_implementation() ../libs/uti/sge_language.c 582 }
  1262  19495 182894186240 <-- sge_gettext_() ../libs/uti/sge_language.c 730 }
  1263  19495 182894186240 --> sge_set_message_id_output() {
  1264  19495 182894186240 <-- sge_set_message_id_output() ../libs/uti/sge_language.c 498 }
  1265  19495         main --> sge_log() {
  1266  19495         main     ../libs/cull/cull_multitype.c 153 !!!!!!!!!! got NULL element for JAT_prio !!!!!!!!!!
  1267  19495 182894186240 --> sge_gettext_() {
  1268  19495 182894186240 --> sge_get_message_id_output_implementation() {
  1269  19495 182894186240 <-- sge_get_message_id_output_implementation() ../libs/uti/sge_language.c 582 }
  1270  19495 182894186240 <-- sge_gettext_() ../libs/uti/sge_language.c 730 }
  1271  19495         main <-- sge_log() ../libs/uti/sge_log.c 620 }


Restarting qmaster doesn't help.

Possibly related to this, is that I did the upgrade in October, and
occasionally get the message "got NULL element for SME_message_list"
in the qmaster messages file.  When this happens qmaster shuts down,
and must be restarted.

This happens a few times a week, and I haven't been able to track down
the cause.  Looking over the logs, there may be a correlation between
large numbers of jobs finishing and this message, but nothing more solid.
As a workaround, I've a cron job that checks if the qmaster is running,
and starts it if needed.

Any suggestions or ideas?


-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92089

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list