[GE users] job_pid: Permission denied

reuti reuti at staff.uni-marburg.de
Wed Mar 3 16:42:55 GMT 2010


Am 03.03.2010 um 16:57 schrieb rsayde:

>> Was the call the last executable line in the script? Then this is
>> taken by the interpreting shell as the value which is returned to
>> calling process.
>
> No, but the script is run with -e which causes it to exit on the  
> first bad command line. I fixed this by adding a wrapper script  
> that checks the return status and returns 1 on failure and 0 on  
> success.
>
>> No, the origin doesn't matter and is the same in the end. Does the
>> setup of your application need some precautions which are for any
>> reason only present for the first call of it and leaves any file/
>> directory after the first call in a dirty state, which crashes the
>> application for sure?
>
> The 99 occurred because I gave the application some bad data. So it  
> was a true error on my part. So it should not reschedule the job.
>
>>
>> Maybe mapping the 99 to 100 as application error would be better  
>> then.
>> You could clean up the problematic state before and clear the error
>> for a rerun of the job.
>
> I don't understand why 100 is better than 99. I've tried to read  
> the docs on handling errors and I'm confused. Do I need to manage  
> files in the spool directory? I would think that SGE could do that.

100 will keep the job in "qw" state but with an error condition set.  
You can clean up the problem, and reset the error with `qmod -cj  
<jobid>` to give the job another chance to run (and it will stay at  
top of the waiting list). It would also prevent from it flooding the  
cluster with this faulty job automatically.

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=246881

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list