Opened 11 years ago

Closed 4 years ago

#314 closed defect (fixed)

IZ1949: 2 instances of non well-formed XML output from "qstat -xml"

Reported by: craffi Owned by: Dave Love <d.love@…>
Priority: high Milestone:
Component: sge Version: 8.1.3
Severity: minor Keywords: clients
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1949]

        Issue #:      1949             Platform:     All      Reporter: craffi (craffi)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.0u7       CC:    None defined
        Status:       STARTED          Priority:     P2
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:        http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=7191
       * Summary:     2 instances of non well-formed XML output from "qstat -xml"
   Status whiteboard:
      Attachments:

     Issue 1949 blocks:
   Votes for issue 1949:


   Opened: Mon Dec 19 08:30:00 -0700 2005 
------------------------


Discussed on the dev mailing list via this thread:

http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=7191

================
Problem #1
================

Summary:
'qstat -j <jobID> -xml' produces non-compliant (not well formed) XML output

Reproduce:
Run "qstat -xml -j <jobID>" on a job ID that does not exist

Sample ouput showing error:

chrisdag:~/sgetest dag$ qstat -j 11 -xml
<?xml version='1.0'?>
<unknown_jobs  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
 <>
   <ST_name>11</ST_name>
 </>
</unknown_jobs>

 -- and --

qstat -j 1234 -xml
error: can't unpack gdi request
error: error unpacking gdi request: bad argument
<?xml version='1.0'?>
<unknown_jobs  xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <>
    <ST_name>1234</ST_name>
  </>
</unknown_jobs>



================
Problem #2
================

Summary:
When there are Grid Engine communication or access problems, "qstat -f -xml"
does not produce XML at all, or in some cases will produce non-well-formed XML
similar to Problem #1 described above

Reproduce:
Less convenient than problem #1. My specific problem was caused by a changing IP
address on the qmaster/execd node. This problem may be reproducable by
intentionaly generating a different sort of error.

Example:

chrisdag:~/sgetest dag$ qstat -f -xml

error: commlib error: access denied (client IP resolved to host name
"chrisdag.local". This is not identical to clients host name
"chrisdag-wireless.private.sonsorol.net")
unable to contact qmaster using port 701 on host "chrisdag.local"

   ------- Additional comments from roland Mon Feb 13 01:42:34 -0700 2006 -------
changed priority

   ------- Additional comments from olesen Fri Jun 26 03:45:56 -0700 2009 -------
*** Issue 3057 has been marked as a duplicate of this issue. ***

   ------- Additional comments from olesen Fri Jun 26 03:49:33 -0700 2009 -------
increase priority
The badly formatted <> and </> tags should be fairly trivial to avoid.

   ------- Additional comments from templedf Thu Dec 3 06:56:05 -0700 2009 -------
From Mark Olesen:

I suspect that problem is in libs/cull/cull_xml.c
lWriteListXML_ (not lWriteELemXML_).


There is no check before the corresponding fprintf.
Eg,

{
  fprintf(fp, "%s<%s%s>", indent, lGetString(elem, XMLA_Name),
     (is_attr?sge_dstring_get_string(&attr):""));
  fprintf(fp, "%s", lGetString(elem, XMLA_Value));
  lWriteListXML_(lGetList(ep, XMLE_List), nesting_level+1, fp);
  fprintf(fp, "</%s>\n", lGetString(elem, XMLA_Name));
}


Wouldn't it just be a simple case of doing this?

{
  const char* tag = lGetString(elem, XMLA_Name);

  if (tag != NULL && strlen(tag))
  {
    fprintf(fp, "%s<%s%s>", indent, tag,
      (is_attr?sge_dstring_get_string(&attr):""));
    fprintf(fp, "%s", lGetString(elem, XMLA_Value));
  }
  lWriteListXML_(lGetList(ep, XMLE_List), nesting_level+1, fp);

  if (tag != NULL && strlen(tag))
  {
    fprintf(fp, "</%s>\n", tag));
  }
}

Change History (3)

comment:1 Changed 6 years ago by dlove

  • Resolution set to fixed
  • Severity set to minor
  • Status changed from new to closed

Fixed by [3581/sge].

comment:2 Changed 4 years ago by dlove

  • Resolution fixed deleted
  • Status changed from closed to reopened
  • Version changed from 6.0u7 to 8.1.3

This isn't properly fixed. Non-running qmaster results in

<?xml version='1.0'?>
<comunication_error >
  <>
    <AN_status>11</AN_status>
    <AN_text>unable to send message to qmaster using port 6444 on host &quot;albion&quot;: got message acknowledge error</AN_text>
    <AN_quality>1</AN_quality>
  </>
</comunication_error>

and with no active jobs, qstat -u \* -j \* -xml gives

<?xml version='1.0'?>
<unknown_jobs  xmlns:xsd="http://arc.liv.ac.uk/repos/darcs/sge/source/dist/util/resources/schemas/qstat/qstat.xsd">
  <>
    <ST_name>*</ST_name>
  </>
</unknown_jobs>

"unknown_jobs" isn't in the schema, and you'd expect an empty list in this case.

comment:3 Changed 4 years ago by Dave Love <d.love@…>

  • Owner set to Dave Love <d.love@…>
  • Resolution set to fixed
  • Status changed from reopened to closed

In 4359/sge:

Fix #314 more: extra cases of null tags in lWriteElemXML_
Also correct <comunication_error> (not in a schema, but could break
anything that looks for the mis-spelling)

Note: See TracTickets for help on using tickets.