[GE users] job_pid: Permission denied

rsayde rsayde at carbondesignsystems.com
Fri Feb 26 16:41:26 GMT 2010

I'm having a problem where I am submitting a number of jobs using something like "qrsh -now n -b y -V run1.sh".

When things go wrong the progression is:

1. A job fails with:

 Exit Status      = 99
 Signal           = unknown signal
failed rescheduling because:
02/26/2010 10:48:32 [15119:18512]: exit_status of job start = 99

2. The job gets requeued and fails with:

 Exit Status      = -1
 Signal           = unknown signal
failed before job because:
02/26/2010 10:48:35 [0:20437]: can't open file job_pid: Permission denied

At that point the queue/host gets put into the error state. That job keeps getting requeued until all the queue/hosts are in the error state.

So my questions are:

1. What job_pid file is it having trouble writing?

2. Is it possible to disable requeueing (temporarily hopefully) with qrsh? I read that it can be done using qconf, but the docs don't say how.

3. Any ideas on how to figure out why the job_start ended with 99? Is it a resource issue?

If you need any other details about my system, please let me know.



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list