Opened 5 years ago

Closed 5 years ago

#1511 closed defect (fixed)

execd does not remember core binding assignments across restart

Reported by: markdixon Owned by: Dave Love <d.love@…>
Priority: normal Milestone:
Component: sge Version: 8.1.6
Severity: minor Keywords:
Cc:

Description

Hi,

At the moment, the execd make decisions on what cores are bound when a job requests them.

If the execd is restarted without killing running jobs, it forgets what cores it has assigned to what job. This means that it can assign the same cores to a new job before being freed by the old one.

The core binding information is held on the execd in the execd_spool_dir, so it should be possible to read it on startup.

Alternatively, moving the core binding decisions into the qmaster would fix both this (and #1479), but obviously is a much bigger job.

Mark

Change History (2)

comment:1 Changed 5 years ago by markdixon

  • Version changed from 8.1.7 to 8.1.6

comment:2 Changed 5 years ago by Dave Love <d.love@…>

  • Owner set to Dave Love <d.love@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 4794/sge:

Fix #1511: fix logic in account_job actually to update the topology

Note: See TracTickets for help on using tickets.