[GE users] trapping errors and restarting tasks...

reuti reuti at staff.uni-marburg.de
Fri Feb 26 23:08:28 GMT 2010


Am 26.02.2010 um 22:34 schrieb paul_simpson:

> a very simple question if you dont mind:
> i want to be able to resubmit a failed array task if it errors  
> out.  however, i cant seem to get sge to play nice.  right now i'm:
> - trapping rendering error in my bash wrapper script
> - exiting said script with error code 100.  (as per:  http:// 
> wikis.sun.com/display/gridengine62u5/Error+Messages )

correct, they should be pushed back into "qw" state but with the  
error flag set, it should show an E in qstat. When you clear the  
error by `qmod -cj <job_id>`, they should try to start again. To  
release individual tasks you have to specify the failed task  
separated by a dot, e.g. 1234.99

Did you set "qmaster_params FORBID_APPERROR=TRUE" by accident in  
SGE's configuration?

-- Reuti

> ..however, the command just 'finishes' and is moved into the  
> finished state where it's unable to be re-run.
> i assume that i'm approaching this in a wrong way and/or i'm being  
> generally dumb.  would anyone mind pointing me in the right  
> direction as i'm sure this must be a very common procedure.
> many thanks in advance.
> -paul
> ps- i'm running sge 6.2u5, job is running in a pe (which manages  
> slots/cores) - all on opensuse 11.0 - 11.2.


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list