Opened 13 years ago
Last modified 10 years ago
#487 new enhancement
IZ2488: Checkpoint cleanup after qdel of pending job
Reported by: | sgaure | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 6.1u3 |
Severity: | Keywords: | qmaster | |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=2488]
Issue #: 2488 Platform: All Reporter: sgaure (sgaure) Component: gridengine OS: All Subcomponent: qmaster Version: 6.1u3 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: ENHANCEMENT Target milestone: --- Assigned to: ernst (ernst) QA Contact: ernst URL: * Summary: Checkpoint cleanup after qdel of pending job Status whiteboard: Attachments: Issue 2488 blocks: Votes for issue 2488: Opened: Tue Feb 12 06:16:00 -0700 2008 ------------------------ We have implemented two application-level checkpoint environments in our sge installation, one is besed on BLCR, the other on VMWARE. The integration with sge has been made transparent to the users by letting a starter_method script handle the startup (in conjunction with the scripts in the ckpt environment). However, an issue remains. When a job is migrated and put back in Rq state, and then deleted when in this state, no cleanup is being run by sge. Thus, for these jobs we must use an external cleanup (of the type "remove checkpoint dirs for non-existent jobs"). There should be a provision in the checkpoint environment for running a cleanup script when a job is deleted (in this way). It's of course not obvious *where* to run such a script, it could perhaps be run as root by qmaster (or on the last node the job ran on (if it still exists), or on a configuration-specified node). It might also be implemented as a general "qdel-hook" in qmaster.
Note: See
TracTickets for help on using
tickets.