[GE users] Killed by Limit and Transfer to another Queue
gilbert at rldp.com.ph
Mon Dec 14 03:24:43 GMT 2009
Thank you again for your reply.
> Why on the same machine? One duty of the checkpointing interface is
> to copy the local data from $TMPDIR to any intermediate storage and
> then to the new node when the job continues on another machine.
well, Im not sure if our software execution can be suspended and be transferred & restart to another machine, but I know it can be suspended and restart for later.
(by the way, we are using Cadence-spectre)
I guess I need to read more about our software and SGE checkpointing too, this is a great idea.
> > The wallclock limit will be so short (about 5mins) that it would be
> > OK to restart the job. Would checkpoint's migration be applicable,
> > or are there other work around?
> Then you don't need any checkpointing facitility. Just reschedule the
> job with: qmod -rj <job_id>
> But why do you want to do this? When you know beforehand that the job
> will run longer then 5 minutes, then you could request this run time
> (-l h_rt) and SGE would automatically send the job to the correct queue.
Sometimes its hard to judge how long a simulation would take, also even if it is very obvious that a job would execute for more 5 minutes, USERs would still queue on the one with the limit. So I want to do the queue transfer automatically.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users