[GE users] Question regarding SGE submission methods (scheduling timeouts)

Kogan, Felix Felix-Kogan at deshaw.com
Tue Jun 20 18:05:00 BST 2006


Sure, we thought about this. This wouldn't solve the problem of pending
jobs and you're right about the jobs "running" on the "dead" nodes.
We've created a reaper script that deletes such "stuck" jobs (we call
them zombies) periodically. Do you know about any other method of
getting rid off zombies?

Thanks,

Felix

-----Original Message-----
From: Rayson Ho [mailto:rayrayson at gmail.com] 
Sent: Tuesday, June 20, 2006 11:44 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Question regarding SGE submission methods
(scheduling timeouts)

I think you can write a wrapper script to do it...

1) qsub the job, get the job id
2) sleep inside the script
3) when sleep wakes up, check if the job is running... otherwise qdel
the job.

Also, "qsub -sync" to wait for the job, plus a background sleep and
then kill <pid of qsub> should also work...

One thing is that if the submission host crashes, then the job is still
run...

Rayson




On 6/19/06, Kogan, Felix <Felix-Kogan at deshaw.com> wrote:
> > We're running a large SGE 6.0u8 installation (hundreds of nodes).
> > Recently a lot of users started asking for a timeout feature. That
is,
> > submit a job and make sure that it returned after certain period of
> > time. Or submit a job and make sure that the submission command
> > returns within a specified timeout if it cannot be executed right
> > away. Something like:
> >
> > # returns in 10 minutes (timeout)
> > $ qsub -to 600 some_job
> >
> > # qrsh returns an error message if the job hasn't been scheduled to
> > run in 5 minutes (pending timeout)
> > $ qrsh -now no -pt 300 some_job
> >
> > I've tried to find something relevant in the docs but could not.
Could
> > someone please enlighten me? Maybe, if this is not possible now, it
> > will be possible in some further releases?
> >
> > Thanks,
> >
> > Felix Kogan,
> > D.E.Shaw & Co
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list