[GE users] Cleanup on Rescheduling and Deleting
reuti at staff.uni-marburg.de
Tue Jan 25 09:34:35 GMT 2005
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Quoting Ron Chen <ron_chen_123 at yahoo.com>:
> Can your job scripts check if the environment var
> $RESTARTED to the number of times SGE has restarted
For me, $RESTARTED is only 0 or 1. Unless you are using application-level
checkpointing, then it's always 2 in case it is restarted. But it would be
nice, if it would count the number of restarts.
> And as an optimization, when $RESTARTED is 0, then
> don't sleep or clear the job output file.
> BTW, I am not getting the behaviour you are getting.
> SGE always waits for the rescheduled jobs. Can you
> post a sample job script?
I can reproduce the behavior on 6.0u1 on lx24_amd64 and 5.3p6 on x86. I checked
the clocks on the master and slaves and got around 30 seconds in both cases,
until the old job really is killed.
> --- Dan Gruhn <Dan.Gruhn at Group-W-Inc.com> wrote:
> > Hi Reuti,
> > Yes, delaying my script for a minute would be a work
> > around for now.
> > However, I am trying to squeeze as much out of my
> > machines as I can and
> > I am thinking that SGE's behavior in this case is
> > wrong. It should not
> > be running the same job at the same time on
> > different CPUs under these
> > or any other circumstances.
> > I think the proper sequence of events should be:
> > 1) Reschedule is requested
> > 2) Job 1 gets the USR2 signal
> > 3) After the notify time, job 1 exits
> > 4) Job 2 is now scheduled to be run.
> > Does this seem right to you?
Yes, agreed. The interesting thing is, that the job is immediately removed in
the qstat output from the old node. I mean, in case of a qdel, you can
sometimes see the job staying there for some additional seconds until it
Cheers - Reuti
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users