[GE users] restart job
John.Sikorski at regeneron.com
Mon Nov 22 19:13:15 GMT 2004
I'm using SGE 5.3 and would like to configure it to restart jobs when an
execution host goes down. I'm not concerned about starting where the
job left off so restarting from the beginning is fine. On my current
installation, if a node goes down a job running on that node will
restart only when the node itself restarts. Is it possible to have the
master sense when a node goes down and reschedule any running jobs?
This would have to work for planned outages and unplanned outages, like
when a node crashes. I tried using the '-r y' switch in qsub and
setting up checkpointing but nothing seemed to matter.
More information about the gridengine-users