Opened 13 years ago

Last modified 11 years ago

#487 new enhancement

IZ2488: Checkpoint cleanup after qdel of pending job

Reported by: sgaure Owned by:
Priority: normal Milestone:
Component: sge Version: 6.1u3
Severity: Keywords: qmaster


[Imported from gridengine issuezilla]

        Issue #:      2488             Platform:     All           Reporter: sgaure (sgaure)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.1u3            CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
       * Summary:     Checkpoint cleanup after qdel of pending job
   Status whiteboard:

     Issue 2488 blocks:
   Votes for issue 2488:

   Opened: Tue Feb 12 06:16:00 -0700 2008 

We have implemented two application-level checkpoint environments in our sge
installation, one is besed on BLCR, the other on VMWARE.  The integration with
sge has been made transparent to the users by letting a starter_method script
handle the startup (in conjunction with the scripts in the ckpt environment).

However, an issue remains.  When a job is migrated and put back in Rq state,
and then deleted when in this state, no cleanup is being run by sge.  Thus, for
these jobs we must use an external cleanup (of the type "remove checkpoint dirs
for non-existent jobs").

There should be a provision in the checkpoint environment for running a cleanup
script when a job is deleted (in this way).  It's of course not obvious *where*
to run such a script, it could perhaps be run as root by qmaster (or on the
last node the job ran on (if it still exists), or on a configuration-specified
node).  It might also be implemented as a general "qdel-hook" in qmaster.

Change History (0)

Note: See TracTickets for help on using tickets.