[GE users] SGE6.0u1 and user hold and -sync issue

Wilfried Gaensheimer wilfried at gaensheimer.de
Wed Jan 12 14:33:45 GMT 2005


Fred L Youhanaie wrote:

Hi,

I agree with Fred, that "qsub -sync y" should return a non-zero
exit code if the batch job or any of a job of an array failed.
One could use the y|n option value, e.g. a "-sync i" would indicate 
"ignore batch job exit status" ...

 From user's point of view it does not make a difference if qsub
itself fails or the batch job did. You cannot continue from that point 
without suitable measures. This is esp. true for using qsub in make.
If you want to continue, you always can ignore the exit status.

I can live with the current situation (our submit wrapper reads the qsub 
-sync output and does the right thing).

BTW, is "qerr" a real tool? It's not in the SGE6 installation.
I'm asking, because I'm seeking a way to retrieve the exit value
of previous jobs with avoiding the qacct overhead. Using qacct seems to 
be not reliable, e.g. if the next job is following very shortly after 
the previous completed.

Bye
Wilfried
P.S.: Still havn't found time to file an issue, but it's on my list.

> 
> Hi Andreas,
> 
> How about using an interface similar to unix system calls, for example, 
> qsub will return 0 if all went well, including all the tasks of an array 
> job, and anything else will indicate that something was wrong. The 
> user/script would then use another utility, e.g. qerr <jobid>, which 
> would then extract the return code(s) of the job and print on stdout, to 
> be parsed further:
...
> If the job array has 10s of thousands tasks, the user will have to think 
> carefully about how to handle the outout of qerr.
> 
> HTH
> 
> Cheers
> f.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list