Opened 8 years ago

Last modified 8 years ago

#1346 new defect

"running job ... that was not supposed to be there" after node dies

Reported by: dlove Owned by:
Priority: normal Milestone:
Component: sge Version: 8.0.0a
Severity: minor Keywords:
Cc:

Description

When a crashed node (at with a slave parallel task) restarts, at least without the ENABLE_RESCHEDULE_... qmaster_params, the job isn't killed, at you can get repeated messages like

execd@lvgig080.nw-grid.ac.uk reports running job (84722.1/master) in queue "parallel@lvgig080.nw-grid.ac.uk" that was not supposed to be there - killing

without the job being killed.

May be related to #252.

Change History (1)

comment:1 Changed 8 years ago by dlove

See also an example in http://gridengine.org/pipermail/users/2011-August/001454.html,
but it happens also with a shared spool.

Note: See TracTickets for help on using tickets.