[GE users] active_jobs directory.

Ron Chen ron_chen_123 at yahoo.com
Tue Mar 14 06:06:37 GMT 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Just came back from lunch and you've got it working :-)

But honestly, I am not sure whether root is needed to write to
the execd_spool_dir directory (even the manpage says so). If
someone knows, may be he/she can shed some light?

 -Ron


--- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
> I changed spool to a filesystem where both root and the
> sgeadmin can write and
> that resolved this problem. It was an issue with a root
> squashing that
> I forgot about.
> 
> Thanks a lot for your help!
> 
> Iwona
> 
> 
> 
> Iwona Sakrejda wrote:
> 
> > so I found out that sgeadmin can write to the spool are, but
> root cannot.
> > Could this be the issue?
> > Man page you gave me (sge_conf(5)) says root has to be able
> to.....
> > 
> > 
> > iwona
> > Ron Chen wrote:
> > 
> >> I followed the steps in sge_conf(5) and it works for me -
> did
> >> you check whether the permission settings are the same as
> the
> >> old one?
> >>
> >> Also, do you have a local host setting that is overriding
> the
> >> global one?
> >>
> >>  -Ron
> >>
> >>
> >> --- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
> >>
> >>> No luck - still same error.
> >>> What I wonder about is why it is not giving the full path
> >>> It says only:
> >>> "cant open file   active_jobs/5.1/error" and so on....
> >>>
> >>> Ron Chen wrote:
> >>>
> >>>
> >>>> Need to make sure that there are no jobs running on the
> >>>
> >>>
> >>> host,
> >>>
> >>>> shutdown execd on that host, and then change
> >>>
> >>>
> >>> execd_spool_dir.
> >>>
> >>>> You can see the comments in:
> >>>>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=103
> >>>>
> >>>> -Ron
> >>>>
> >>>>
> >>>> --- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
> >>>>
> >>>>
> >>>>> yes, I did and I cleared the error.
> >>>>> I restarted sgexecd on the compute node and on the
> >>>
> >>>
> >>> submission
> >>>
> >>>>> node.
> >>>>> Does it need to run on the master node (I don't think it
> ran
> >>>>> before..)
> >>>>>
> >>>>> I see execution host creates directories in the
> >>>
> >>>
> >>> default/spool
> >>>
> >>>>> area,
> >>>>> but the job fails and there is a message in
> >>>>> default/spool/<host>/messages:
> >>>>>
> >>>>> 03/13/2006 18:57:17|execd|pc2203|E|shepherd of job 5.1
> >>>
> >>>
> >>> exited
> >>>
> >>>>> with exit status = 7
> >>>>> 03/13/2006 18:57:17|execd|pc2203|W|reaping job "5" ptf
> >>>>> complains: Job does not exist
> >>>>> 03/13/2006 18:57:17|execd|pc2203|E|abnormal termination
> of
> >>>>> shepherd for job 5.1: no "exit_status" file
> >>>>> 03/13/2006 18:57:17|execd|pc2203|E|cant open file
> >>>>> active_jobs/5.1/error: No such file or directory
> >>>>> 03/13/2006 18:57:17|execd|pc2203|E|can't open pid file
> >>>>> "active_jobs/5.1/pid" for job 5.1
> >>>>> 03/13/2006 18:57:17|execd|pc2203|I|sending admin mail
> mail
> >>>
> >>>
> >>> to
> >>>
> >>>>> user "sgeadm at nersc.gov"|mailer
> "/common/sge/util/pdsf_mail"|"SGE 
> >>>>> 6.0u4: Job 5 failed"
> >>>>>
> >>>>>
> >>>>> Ron Chen wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> Did you change var "execd_spool_dir" with cmd "qconf
> >>>>>
> >>>>>
> >>>>> -mconf"?
> >>>>>
> >>>>>
> >>>>>> Then use cmd "qmod -cq <queue>" to clear the error.
> >>>>>>
> >>>>>> -Ron
> >>>>>>
> >>>>>>
> >>>>>> --- Iwona Sakrejda <isakrejda at lbl.gov> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> My job execution is failing because the excution host
> >>>>>>> cannot find the job it's supposed to run and the
> >>>>>>> <exec_host>/active_jobs directory is  empty in the
> spool
> >>>>>>> area.
> >>>>>>>
> >>>>>>> Jobs show up in the queue, but queue on the excution
> >>>>>>> host goes into an error state.
> >>>>>>>
> >>>>>>> What could have gotten misconfigured?
> >>>>>>>
> >>>>>>> Suggestions appreciated
> >>>>>>>
> >>>>>>> Iwona
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>
>
---------------------------------------------------------------------
> >>
> >>>>>>> To unsubscribe, e-mail:
> >>>>>>> users-unsubscribe at gridengine.sunsource.net
> >>>>>>> For additional commands, e-mail:
> >>>>>>> users-help at gridengine.sunsource.net
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> __________________________________________________
> >>>>>> Do You Yahoo!?
> >>>>>> Tired of spam?  Yahoo! Mail has the best spam
> protection
> >>>>>
> >>>>>
> >>>>> around
> >>>>>
> >>>>>> http://mail.yahoo.com
> >>>>>>
> >>>>>
> >>
>
---------------------------------------------------------------------
> >>
> >>>>>> To unsubscribe, e-mail:
> >>>>>
> >>>>>
> >>>>> users-unsubscribe at gridengine.sunsource.net
> >>>>>
> >>>>>
> >>>>>> For additional commands, e-mail:
> >>>>>
> >>>>>
> >>>>> users-help at gridengine.sunsource.net
> >>>>>
> >>>>>
> >>>>
> >>
>
---------------------------------------------------------------------
> >>
> >>>>> To unsubscribe, e-mail:
> >>>>> users-unsubscribe at gridengine.sunsource.net
> >>>>> For additional commands, e-mail:
> >>>>> users-help at gridengine.sunsource.net
> >>>>>
> >>>>>
> >>>>
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list