[GE issues] [Issue 3014] qmaster crash in schedd_mes_add_global/lGetPosViaElem

joga Joachim.Gabler at sun.com
Mon May 4 09:53:00 BST 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3014



User joga changed the following:

                What    |Old value                 |New value
================================================================================
             Assigned to|ernst                     |joga
--------------------------------------------------------------------------------
        Target milestone|---                       |6.2u3
--------------------------------------------------------------------------------




------- Additional comments from joga at sunsource.net Mon May  4 01:52:58 -0700 2009 -------
Evaluation:
Cannot reproduce it, but the code where the core dump is happening looks fishy:
In function schedd_mes_add_global
it retrieves the messages list from the thread local storage:
      lListElem *sme = sconf_get_sme();
This data structure is initialized by calling
schedd_mes_initialize,
which is only called in scheduler thread, but we are in worker thread.

The line
   if (!monitor_alpp && sconf_get_schedd_job_info() != SCHEDD_JOB_INFO_FALSE) {
should probably make sure that we do not run into this situation, but it looks as in some special case it fails to do so.

It must be a case where qsub -w e is called (in this case monitor_alpp is NULL),
and schedd_job_info is switched on.
sconf_get_schedd_job_info accesses thread local storage, and it should always be false for the worker threads, but it looks as if there is a
case where it is true.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=189691

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list