Opened 4 years ago

Last modified 3 years ago

#1554 new defect

Out of date user/project objects in spool

Reported by: markdixon Owned by:
Priority: normal Milestone:
Component: sge Version: 8.1.8
Severity: minor Keywords:
Cc:

Description

gridengine tries hard not to do too much I/O, so rate limits changes to the user/project objects in the spool.

This can mean that little-used objects are left out of date in the spool, running the risk of losing usage data in the event of a qmaster restart (until #1551 is fixed) or crash.

Setting the qmaster_params option STREE_SPOOL_INTERVAL to shorter than the scheduling interval might mitigate, but scheduler config options like flush_submit_sec and flush_finish_sec will probably outfox this.

The qmaster thread should perhaps let the sched thread know it has rate limited (look for Follow_Control.is_spooling) in its answer list, so that sge_build_sgeee_orders knows to collect and resend changes from earlier seqno's.

Change History (1)

comment:1 Changed 3 years ago by markdixon

The answer list returned after the scheduler sends the qmaster its orders doesn't seem like the best place to fix this after all. Only used for logging purposes rather than exception handling?

Need to think through the command and coordination between the main qmaster thread and the scheduling thread.

Note: See TracTickets for help on using tickets.