Opened 11 years ago

Closed 8 years ago

#495 closed defect (fixed)

IZ2530: execd: "ptf complains: Job does not exist" message in log file

Reported by: jogoodma Owned by:
Priority: normal Milestone:
Component: sge Version: 6.1u3
Severity: minor Keywords: Linux execution
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2530]

        Issue #:      2530             Platform:     All      Reporter: jogoodma (jogoodma)
       Component:     gridengine          OS:        Linux
     Subcomponent:    execution        Version:      6.1u3       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    pollinger (pollinger)
      QA Contact:     pollinger
          URL:
       * Summary:     execd: "ptf complains: Job does not exist" message in log file
   Status whiteboard:
      Attachments:

     Issue 2530 blocks:
   Votes for issue 2530:  5


   Opened: Thu Mar 20 08:32:00 -0700 2008 
------------------------


This issue may be a duplication of
http://gridengine.sunsource.net/issues/show_bug.cgi?id=1806.

I'm seeing the same type of warning messages.  In the execution host messages
file I have:

reaping job "54988" ptf complains: Job does not exist

and then 1 second later in the qmaster file I have:

"job 54988.1 finished on host <hostname>"

I can reproduce the problem using the sleeper.sh script similar to what is
described in 1806.  Setting the sleep time to 1 doesn't result in any warning
messages.   The jobs do complete normally with correct output; however, every
couple of weeks I will experience a complete queue lockup.  Jobs will be shown
as running but nothing is running on the exec host.  Killing the jobs removes
them from the queue but the next jobs that fill the slots get stuck in the same
manner.  Restarting SGE is the only solution I have found when this problem
occurs.  No other error messages have been observed in the logs to indicate a
separate problem.

I've have reproduced this with a single master/exec host setup running RHEL 5
and a separate master and exec host setup with both hosts running RHEL 4.  All
setups were tested with fresh installs of 6.1u3.

Change History (1)

comment:1 Changed 8 years ago by dlove

  • Resolution set to fixed
  • Severity set to minor
  • Status changed from new to closed

Apparently fixed in 6.1u6 as duplicate of IZ1806.

Note: See TracTickets for help on using tickets.