[GE users] specifying max_errors for qsub

Ron Chen ron_chen_123 at yahoo.com
Wed Mar 9 15:23:06 GMT 2005


For that you need to put the logic in the script.

If you are using array jobs, then may be it is easier
to enchance qmaster to do what you want without any
external scripts.

 -Ron

--- "Petla, Raghuram_Murthy"
<Raghuram.Murthy.Petla at deshaw.com> wrote:
> But how to know the number of failed jobs then?
> 
> -----Original Message-----
> From: Ron Chen [mailto:ron_chen_123 at yahoo.com] 
> Sent: Wednesday, March 09, 2005 8:37 PM
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] specifying max_errors for
> qsub
> 
> 
> Ok, if you want to avoid constantly calling qacct,
> may
> be you can use "qevent" to find out when a job is
> done
> and only do 1 qacct for that job.
> 
>  -Ron
> 
> --- "Petla, Raghuram_Murthy"
> <Raghuram.Murthy.Petla at deshaw.com> wrote:
> > Hi Ron,
> > 
> > here fail means, if job fails during execution.
> > 
> > More detailed explanation:
> > 
> > Assume that I have submitted a job with 200 jobs
> and
> > I want to set 10 as
> > max_errors_allowed.
> > Now SGE schedules each job when a node is free.
> > Assume that among 50
> > jobs run 10 jobs exited with non-zero values.
> > I consider those 10 as failed commands. If one
> more
> > job exits with
> > non-zero value, then total failed commands are
> more
> > than 10, so I want
> > to delete 52:200 jobs from being scheduled.
> > 
> > I just want to check is there any option for sqsub
> > to do this.
> > 
> > Currently whay I am doing is, I am submitting the
> > job and constantly
> > running qacct command to get the exit status of
> each
> > job and using qdel
> > if number of failed commands are more than maximum
> > allowed.
> > 
> > I want to avoid constant calling of qacct, as it
> > causes CPU hog.
> > 
> > Thanks,
> > -Raghuram
> > 
> > 
> > > Each qsub request should be independent of each
> > other.
> > >
> > > And by "fail", you mean it fails to submit or
> the
> > job
> > > itself fails during execution?
> > >
> > > -Ron
> > 
> > --- "Petla, Raghuram_Murthy"
> > <Raghuram.Murthy.Petla at deshaw.com> wrote:
> > > Hello,
> > > 
> > > I am submitting around 200 jobs using qsub.
> > > Considering that a job fails
> > > if it exits with noz-zero, I want to specify
> > > max_allowed_errors, so that
> > > qsub should abort remaining jobs if there are
> more
> > > number of fail jobs
> > > than maximum allowed.
> > > 
> > > How can I achieve this?
> > > 
> > > Thanks
> > > -Raghuram
> > > 
> > > 
> > >
> >
>
---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail:
> > > users-help at gridengine.sunsource.net
> > > 
> > > 
> > 
> > 
> > 	
> > 		
> > __________________________________ 
> > Celebrate Yahoo!'s 10th Birthday! 
> > Yahoo! Netrospective: 100 Moments of the Web 
> > http://birthday.yahoo.com/netrospective/
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> > users-help at gridengine.sunsource.net
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> > users-help at gridengine.sunsource.net
> > 
> > 
> 
> 
> 	
> 		
> __________________________________ 
> Celebrate Yahoo!'s 10th Birthday! 
> Yahoo! Netrospective: 100 Moments of the Web 
> http://birthday.yahoo.com/netrospective/
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 


	
		
__________________________________ 
Celebrate Yahoo!'s 10th Birthday! 
Yahoo! Netrospective: 100 Moments of the Web 
http://birthday.yahoo.com/netrospective/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list