[GE users] job_pid: Permission denied

reuti reuti at staff.uni-marburg.de
Mon Mar 1 18:58:25 GMT 2010


Hi,

Am 26.02.2010 um 17:41 schrieb rsayde:

> I'm having a problem where I am submitting a number of jobs using  
> something like "qrsh -now n -b y -V run1.sh".
>
> When things go wrong the progression is:
>
> 1. A job fails with:
>
>  Exit Status      = 99
>  Signal           = unknown signal
> <snip>
> failed rescheduling because:
> 02/26/2010 10:48:32 [15119:18512]: exit_status of job start = 99
>
> 2. The job gets requeued and fails with:
>
>  Exit Status      = -1
>  Signal           = unknown signal
> <snip>
> failed before job because:
> 02/26/2010 10:48:35 [0:20437]: can't open file job_pid: Permission  
> denied
>
> At that point the queue/host gets put into the error state. That  
> job keeps getting requeued until all the queue/hosts are in the  
> error state.
>
> So my questions are:
>
> 1. What job_pid file is it having trouble writing?

did both executions run on the same machine? I think these are two  
unrelated effects. 1. from your application, 2. one of the machines  
you use has an inproper setting for the spool directory - is it local  
on each machine?


> 2. Is it possible to disable requeueing (temporarily hopefully)  
> with qrsh? I read that it can be done using qconf, but the docs  
> don't say how.

It's necessary to set:

$ qconf -sconf
...
qmaster_params FORBID_RESCHEDULE=TRUE

(man qconf)


> 3. Any ideas on how to figure out why the job_start ended with 99?  
> Is it a resource issue?

I would assume that it's just one of the errors your application can  
create under certain conditions. Is it Molcas? Maybe you always have  
to set the above mentioned setting to allow a proper handling of this  
error.

-- Reuti


> If you need any other details about my system, please let me know.
>
> thanks
> Richard
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=246206
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=246559

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list