[GE users] Question regarding SGE submission methods (scheduling timeouts)

Kogan, Felix Felix-Kogan at deshaw.com
Thu Jun 22 17:53:32 BST 2006


Yes, thanks, combination of "max_unheard" (which by default is too
short, I think) and "reschedule_unknown" might help in some cases (when
accidental rerunning while original job is not quite dead is harmless).


I certainly would like to request this enhancement. How hard is it to
get a sunsource account? The page you suggested doesn't have any entry
forms, only query. Where can I find the explanation of the necessary
procedures?

Thanks,

Felix

-----Original Message-----
From: Ron Chen [mailto:ron_chen_123 at yahoo.com] 
Sent: Thursday, June 22, 2006 3:25 AM
To: users at gridengine.sunsource.net
Subject: RE: [GE users] Question regarding SGE submission methods
(scheduling timeouts)

BTW, another useful flag is "reschedule_unknown".

The timeout feature makes sense, and if you want to request it
to be added to the next version of SGE, you can enter an
enchancement request to the issue DB:

http://gridengine.sunsource.net/servlets/ProjectIssues

But before you do that, you will need to get a sunsource account
first.

 -Ron



--- "Kogan, Felix" <Felix-Kogan at deshaw.com> wrote:
> Oh, you mean a wrapper for SGE submission utilities. I thought
> it was
> about the wrapper for the jobs themselves. Well, how would you
> get a job
> ID in the wrapper in case of sqrsh? Also, users often submit
> hundreds of
> jobs in short succession using sqsub (even though we try
> encourage them
> to use job arrays). The approach you described would mean
> hundreds of
> processes on the calling host waiting for the jobs to be
> scheduled and
> constantly calling qstat. Not a very healthy situation. No,
> this really
> would be useful and convenient if this functionality existed
> in the
> scheduler.
> 
> Thanks for mentioning "max_unheard". I've missed it somehow.
> It is
> really useful.
> 
> --
> Felix
> 
> -----Original Message-----
> From: Rayson Ho [mailto:rayrayson at gmail.com] 
> Sent: Tuesday, June 20, 2006 1:51 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Question regarding SGE submission
> methods
> (scheduling timeouts)
> 
> Don't understand why pending jobs wouldn't be solved by this -
> as long
> as as you can get the job's status via qstat, then you can
> qdel the
> job.
> 
> Also, to handle zombie jobs, see "max_unheard" in sge_conf(5).
> 
> Rayson
> 
> 
> 
> On 6/20/06, Kogan, Felix <Felix-Kogan at deshaw.com> wrote:
> > Sure, we thought about this. This wouldn't solve the problem
> of
> pending
> > jobs and you're right about the jobs "running" on the "dead"
> nodes.
> > We've created a reaper script that deletes such "stuck" jobs
> (we call
> > them zombies) periodically. Do you know about any other
> method of
> > getting rid off zombies?
> >
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list