Opened 11 years ago

Closed 6 years ago

#588 closed defect (fixed)

IZ2767: qping info output always shows warning/error

Reported by: juby Owned by: Dave Love <d.love@…>
Priority: normal Milestone:
Component: sge Version: 6.2
Severity: minor Keywords: qmaster
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2767]

        Issue #:      2767             Platform:     All      Reporter: juby (juby)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.2         CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     qping info output always shows warning/error
   Status whiteboard:
      Attachments:

     Issue 2767 blocks:
   Votes for issue 2767:


   Opened: Fri Oct 31 08:30:00 -0700 2008 
------------------------


ive noticed that, out of the box, the qping 'info' output for my 6.2 qmaster
looks like this:

: qping -info host $SGE_QMASTER_PORT qmaster 1
10/31/2008 11:20:02:
SIRM version:             0.1
SIRM message id:          1
start time:               10/30/2008 11:18:37 (1225379917)
run time [s]:             86485
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 3
status:                   2
info:                     MAIN: E (86485.35) | signaler000: E (86484.85) |
event_master000: E (0.25) | timer000: E (5.25) | worker000: W (84.24) |
worker001: W (9.25) | listener000: W (2.53) | listener001: W (6.55) |
scheduler000: W (9.24) | ERROR
malloc:                   arena(1228800) |ordblks(743) | smblks(1) | hblksr(0) |
hblhkd(0) usmblks(0) | fsmblks(48) | uordblks(929344) | fordblks(299456) |
keepcost(76824)
Monitor:                  disabled


pretty much all the time, even under the following conditions:
- no visible problem (jobs get queued and run, commands like qstat, qconf, etc
work, qmon is functional)
- a clean, minimal install of 6.2 using berkeley db RPC (no execd, no shadowd,
no arco)
- a clean, minimal install of 6.2 using classic spooling (no execd, no shadowd,
no arco)


In addition, my 6.2 / berkeley db RPC install went through a few days of
relatively frequent deadlocks, with errors of the form:

|E|error writing object with key "JOB:     259" into berkeley database: (-30995)
DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock

This is after restarts of the RPC server and qmaster. The spooling database is
stored on a local filesystem (the db has since been rebuilt, have not seen a
deadlock yet since rebuilding)

Also, qping on execds always report a warning:

: qping -info exechost $SGE_EXECD_PORT execd 1
10/31/2008 11:13:21:
SIRM version:             0.1
SIRM message id:          1
start time:               10/31/2008 11:04:49 (1225465489)
run time [s]:             512
messages in read buffer:  0
messages in write buffer: 0
nr. of connected clients: 2
status:                   1
info:                     sge_execd_process_messages: W (509.61) | WARNING
malloc:                   arena(135168) |ordblks(26) | smblks(3) | hblksr(0) |
hblhkd(0) usmblks(0) | fsmblks(112) | uordblks(77240) | fordblks(57928) |
keepcost(51464)
Monitor:                  disabled


lx24-x86
Fedora core 4

: qstat -help | head -1
GE 6.2

   ------- Additional comments from juby Fri Oct 31 08:31:13 -0700 2008 -------
assigning os to linux

   ------- Additional comments from crei Thu Feb 26 07:25:34 -0700 2009 -------
I think this problem is not only linux specific

Change History (2)

comment:1 Changed 9 years ago by dlove

  • Severity set to minor

Word from the developers was that qping is broken generally.

comment:2 Changed 6 years ago by Dave Love <d.love@…>

  • Owner set to Dave Love <d.love@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 4671/sge:

Fix #588: Correct thread timeout monitoring (spurious errors from qping)

Note: See TracTickets for help on using tickets.