Opened 8 years ago
Closed 8 years ago
#1449 closed enhancement (fixed)
Behavior of checkpoint_method on job termination
Reported by: | wish | Owned by: | Dave Love <d.love@…> |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 6.2u3 |
Severity: | minor | Keywords: | |
Cc: |
Description
If a job is terminated (with ENABLE_ADDGRP_KILL=true )while it is being checkpointed the ckpt_command is not killed by grid engine. This can cause issues with some checkpointing tools (eg ompi-checkpoint command from openmpi when used with blcr) which don't terminate if you kill the processes it is trying to checkpoint. This isn't too hard to work around but should be documented.
Possibly one could delay termination of a job until after ckpt_command has finished running.
Change History (2)
comment:1 Changed 8 years ago by dlove
comment:2 Changed 8 years ago by Dave Love <d.love@…>
- Owner set to Dave Love <d.love@…>
- Resolution set to fixed
- Status changed from new to closed
In 4490/sge:
Note: See
TracTickets for help on using
tickets.
Is the additional group not actually added to the command currently?
That would include the hook commands in the accounting. It's not clear
to me if that's appropriate but it seems reasonable. WDYT?