[GE users] active_jobs directory.

Iwona Sakrejda isakrejda at lbl.gov
Tue Mar 14 05:46:11 GMT 2006


I changed spool to a filesystem where both root and the sgeadmin can write and
that resolved this problem. It was an issue with a root squashing that
I forgot about.

Thanks a lot for your help!

Iwona



Iwona Sakrejda wrote:

> so I found out that sgeadmin can write to the spool are, but root cannot.
> Could this be the issue?
> Man page you gave me (sge_conf(5)) says root has to be able to.....
> 
> 
> iwona
> Ron Chen wrote:
> 
>> I followed the steps in sge_conf(5) and it works for me - did
>> you check whether the permission settings are the same as the
>> old one?
>>
>> Also, do you have a local host setting that is overriding the
>> global one?
>>
>>  -Ron
>>
>>
>> --- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
>>
>>> No luck - still same error.
>>> What I wonder about is why it is not giving the full path
>>> It says only:
>>> "cant open file   active_jobs/5.1/error" and so on....
>>>
>>> Ron Chen wrote:
>>>
>>>
>>>> Need to make sure that there are no jobs running on the
>>>
>>>
>>> host,
>>>
>>>> shutdown execd on that host, and then change
>>>
>>>
>>> execd_spool_dir.
>>>
>>>> You can see the comments in:
>>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=103
>>>>
>>>> -Ron
>>>>
>>>>
>>>> --- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
>>>>
>>>>
>>>>> yes, I did and I cleared the error.
>>>>> I restarted sgexecd on the compute node and on the
>>>
>>>
>>> submission
>>>
>>>>> node.
>>>>> Does it need to run on the master node (I don't think it ran
>>>>> before..)
>>>>>
>>>>> I see execution host creates directories in the
>>>
>>>
>>> default/spool
>>>
>>>>> area,
>>>>> but the job fails and there is a message in
>>>>> default/spool/<host>/messages:
>>>>>
>>>>> 03/13/2006 18:57:17|execd|pc2203|E|shepherd of job 5.1
>>>
>>>
>>> exited
>>>
>>>>> with exit status = 7
>>>>> 03/13/2006 18:57:17|execd|pc2203|W|reaping job "5" ptf
>>>>> complains: Job does not exist
>>>>> 03/13/2006 18:57:17|execd|pc2203|E|abnormal termination of
>>>>> shepherd for job 5.1: no "exit_status" file
>>>>> 03/13/2006 18:57:17|execd|pc2203|E|cant open file
>>>>> active_jobs/5.1/error: No such file or directory
>>>>> 03/13/2006 18:57:17|execd|pc2203|E|can't open pid file
>>>>> "active_jobs/5.1/pid" for job 5.1
>>>>> 03/13/2006 18:57:17|execd|pc2203|I|sending admin mail mail
>>>
>>>
>>> to
>>>
>>>>> user "sgeadm at nersc.gov"|mailer "/common/sge/util/pdsf_mail"|"SGE 
>>>>> 6.0u4: Job 5 failed"
>>>>>
>>>>>
>>>>> Ron Chen wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Did you change var "execd_spool_dir" with cmd "qconf
>>>>>
>>>>>
>>>>> -mconf"?
>>>>>
>>>>>
>>>>>> Then use cmd "qmod -cq <queue>" to clear the error.
>>>>>>
>>>>>> -Ron
>>>>>>
>>>>>>
>>>>>> --- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> My job execution is failing because the excution host
>>>>>>> cannot find the job it's supposed to run and the
>>>>>>> <exec_host>/active_jobs directory is  empty in the spool
>>>>>>> area.
>>>>>>>
>>>>>>> Jobs show up in the queue, but queue on the excution
>>>>>>> host goes into an error state.
>>>>>>>
>>>>>>> What could have gotten misconfigured?
>>>>>>>
>>>>>>> Suggestions appreciated
>>>>>>>
>>>>>>> Iwona
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>> ---------------------------------------------------------------------
>>
>>>>>>> To unsubscribe, e-mail:
>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail:
>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> __________________________________________________
>>>>>> Do You Yahoo!?
>>>>>> Tired of spam?  Yahoo! Mail has the best spam protection
>>>>>
>>>>>
>>>>> around
>>>>>
>>>>>> http://mail.yahoo.com
>>>>>>
>>>>>
>> ---------------------------------------------------------------------
>>
>>>>>> To unsubscribe, e-mail:
>>>>>
>>>>>
>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>> For additional commands, e-mail:
>>>>>
>>>>>
>>>>> users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>> ---------------------------------------------------------------------
>>
>>>>> To unsubscribe, e-mail:
>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail:
>>>>> users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> __________________________________________________
>>>> Do You Yahoo!?
>>>> Tired of spam?  Yahoo! Mail has the best spam protection
>>>
>>>
>>> around
>>>
>>>> http://mail.yahoo.com
>>>>
>>>
>> ---------------------------------------------------------------------
>>
>>>> To unsubscribe, e-mail:
>>>
>>>
>>> users-unsubscribe at gridengine.sunsource.net
>>>
>>>> For additional commands, e-mail:
>>>
>>>
>>> users-help at gridengine.sunsource.net
>>>
>>>
>> ---------------------------------------------------------------------
>>
>>> To unsubscribe, e-mail:
>>> users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail:
>>> users-help at gridengine.sunsource.net
>>>
>>>
>>
>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around 
>> http://mail.yahoo.com
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list