Opened 19 years ago
Last modified 10 years ago
#4 new enhancement
IZ61: Enhancements for ckpt/reschedule facility
Reported by: | ernst | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | current |
Severity: | Keywords: | qmaster | |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=61]
Issue #: 61 Platform: All Reporter: ernst (ernst) Component: gridengine OS: All Subcomponent: qmaster Version: current CC: None defined Status: NEW Priority: P3 Resolution: Issue type: ENHANCEMENT Target milestone: --- Assigned to: andreas (andreas) QA Contact: ernst URL: * Summary: Enhancements for ckpt/reschedule facility Status whiteboard: Attachments: Issue 61 blocks: Votes for issue 61: Opened: Mon Sep 17 05:01:00 -0700 2001 ------------------------ According to the discussion with Martin Klook and Ron Chen (users@gridengine.sunsource.net; subject: reschedule facility) following information enhancements would be helpfull in case of checkpointing and (automatic) rescheduling: - Number of rescheduling/checkpointing events - Host where the job was fist/previously executed - In case of rescheduling: for ckpt-jobs the restart_command should be executed if ckpt_command was executed previously. The RESTARTED environment variable which is set in the job environment could provide the number of events. FIRST_HOST and LAST_HOST may be set accordingly. We have to make sure that the job was able to execute the ckpt_command command successfully before we mention a hostname through one of these varibles. The restart_command can only be executed in case of rescheduling, when the master knows that ckpt_command was successfully executed. We have to transfer this information (shepherd -> execd -> qmaster) during the runtime of the job. ------- Additional comments from ernst Thu Jul 11 08:10:13 -0700 2002 ------- The facility defined in Issue #315 might be used to workaround the missing functionality. "qsub/qalter -ac ENV:variable=value" might be used in the various checkpointing scripts do defines such variables as FIRST_HOST, LAST_HOST or RESTARTED. ------- Additional comments from sgrell Mon Dec 12 03:14:46 -0700 2005 ------- Changed subcomponent. Stephan
Note: See
TracTickets for help on using
tickets.