[GE users] job exit codes and tasks pending via hold_jid arguments

craffi dag at sonsorol.org
Tue Dec 22 18:33:11 GMT 2009


Hi folks,

Given this page:

http://wikis.sun.com/display/GridEngine/Troubleshooting+and+Error+Messages

... it seems to show tables that say that any exit code other than 0, 
100 or 99 from a job will indicate a successful job execution to SGE.

Exit 100 seems nice but from memory I recall that it will put the job 
into Eqw state or something similar that requires human interaction to 
manually remove.

The reason this came up is due to a simple workflow that uses job 
dependencies, there are times where the first job encounters a specific 
error case and exits with a special workflow-meaningful code of 255 -- 
it looks like SGE does not see this as an actual failure and thus allows 
the dependent jobs to go on for dispatch and execution.

Looking for the proper way to exit on error in a way does not make the 
job linger and also does not allow any jobs with -hold_jid set to 
execute when the upstream task leaves the system.

Epilog script? Qalter or qdel from within the first job? Something else?

Regards,
Chris

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234638

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list