[GE users] sge_execd problems

Rayson Ho rayrayson at gmail.com
Sun Oct 19 17:53:16 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I always pick flatfile (classic) spooling. It's easier to backup,
works with NFSv3, and works well for small to medium clusters.

But the best way "to avoid something like this" is to backup your
$SGE_ROOT peroidically!!

Rayson



On 10/19/08, Mag Gam <magawake at gmail.com> wrote:
> To avoid something like this, what is the recommeded method to store
> data? Database or flatfile?
>
>
>
> On Fri, Oct 17, 2008 at 2:15 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> > Should be available from the "spooling_method" parameter -- see the
> > bootstrap(5) manpage.
> >
> > No matter what the spooling method is used, looks like your qmaster
> > machine crashed recently, and you will need to fix the corrupted
> > configuration files.
> >
> > Rayson
> >
> >
> > On 10/17/08, Mag Gam <magawake at gmail.com> wrote:
> >> I don't know...how can I check?
> >>
> >> On Fri, Oct 17, 2008 at 2:00 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> >> > On 10/17/08, Mag Gam <magawake at gmail.com> wrote:
> >> >> 10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|E|cqueue_list_locate_qinstance("(null)@(null)"):
> >> >> cqueue == NULL("(null)", "(null)", 1, 0
> >> >> 10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|E|writing job
> >> >> finish information: can't locate queue "(null)@(null)"
> >> >
> >> > Looks like your SGE configuration is corrupted! Are you using Berkeley
> >> > DB spooling or classic spooling??
> >> >
> >> > Rayson
> >> >
> >> >
> >> >
> >> >> 10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|W|job 5014.1
> >> >> failed on host <unknown host> before writing exit_status because:
> >> >> shepherd exited with exit status 19
> >> >> 10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|C|!!!!!!!!!! got
> >> >> NULL element for QU_rerun !!!!!!!!!!
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Oct 17, 2008 at 1:32 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> >> >> > Looks like a network resolution/connection problem... Are you abkle to
> >> >> > connect to the master from the command line, like:
> >> >> >
> >> >> > % telnet master01.engrMec.unc.edu 536
> >> >> >
> >> >> > Rayson
> >> >> >
> >> >> >
> >> >> > On 10/17/08, Mag Gam <magawake at gmail.com> wrote:
> >> >> >> I have the sgemaster running on our head node and the on the clients I
> >> >> >> am able to start up sge_execd
> >> >> >>
> >> >> >> I see sge_execd process running on the client.
> >> >> >>
> >> >> >> But when I do
> >> >> >>
> >> >> >> $ qhost
> >> >> >> error: commlib error: can't connect to service (Connection refused)
> >> >> >> error: unable to contact qmaster using port 536 on host
> >> >> >> "master01.engrMec.unc.edu"
> >> >> >>
> >> >> >>
> >> >> >> When I start up the client I see no changed in the messages file
> >> >> >> either. Has anyone seen this before? Using, GE 6.1u5
> >> >> >>
> >> >> >> TIA
> >> >> >>
> >> >> >> ---------------------------------------------------------------------
> >> >> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> >> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >> >> >>
> >> >> >>
> >> >> >
> >> >> > ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> >> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >> >> >
> >> >> >
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >> >>
> >> >>
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list