[GE users] Cleanup on Rescheduling and Deleting

Reuti reuti at staff.uni-marburg.de
Tue Jan 25 09:34:35 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Quoting Ron Chen <ron_chen_123 at yahoo.com>:

> Can your job scripts check if the environment var
> $RESTARTED to the number of times SGE has restarted
> it?

For me, $RESTARTED is only 0 or 1. Unless you are using application-level 
checkpointing, then it's always 2 in case it is restarted. But it would be 
nice, if it would count the number of restarts.
 
> And as an optimization, when $RESTARTED is 0, then
> don't sleep or clear the job output file.
> 
> BTW, I am not getting the behaviour you are getting.
> SGE always waits for the rescheduled jobs. Can you
> post  a sample job script?

I can reproduce the behavior on 6.0u1 on lx24_amd64 and 5.3p6 on x86. I checked 
the clocks on the master and slaves and got around 30 seconds in both cases, 
until the old job really is killed.

> --- Dan Gruhn <Dan.Gruhn at Group-W-Inc.com> wrote:
> > Hi Reuti,
> > 
> > Yes, delaying my script for a minute would be a work
> > around for now. 
> > However, I am trying to squeeze as much out of my
> > machines as I can and
> > I am thinking that SGE's behavior in this case is
> > wrong.  It should not
> > be running the same job at the same time on
> > different CPUs under these
> > or any other circumstances.
> > 
> > I think the proper sequence of events should be:
> > 
> > 1) Reschedule is requested
> > 2) Job 1 gets the USR2 signal
> > 3) After the notify time, job 1 exits
> > 4) Job 2 is now scheduled to be run.
> > 
> > Does this seem right to you?

Yes, agreed. The interesting thing is, that the job is immediately removed in 
the qstat output from the old node. I mean, in case of a qdel, you can 
sometimes see the job staying there for some additional seconds until it 
disappears.

Cheers - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list