Opened 8 years ago
Last modified 8 years ago
#1346 new defect
"running job ... that was not supposed to be there" after node dies
Reported by: | dlove | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 8.0.0a |
Severity: | minor | Keywords: | |
Cc: |
Description
When a crashed node (at with a slave parallel task) restarts, at least without the ENABLE_RESCHEDULE_... qmaster_params, the job isn't killed, at you can get repeated messages like
execd@lvgig080.nw-grid.ac.uk reports running job (84722.1/master) in queue "parallel@lvgig080.nw-grid.ac.uk" that was not supposed to be there - killing
without the job being killed.
May be related to #252.
Note: See
TracTickets for help on using
tickets.
See also an example in http://gridengine.org/pipermail/users/2011-August/001454.html,
but it happens also with a shared spool.