[GE users] sge_execd problems

Mag Gam magawake at gmail.com
Tue Oct 21 12:12:27 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Thanks Rayson.

So, if I am using a bdb can I conver to flat file spooling? but I
think your advice of backing up is more critical!



On Sun, Oct 19, 2008 at 12:53 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> I always pick flatfile (classic) spooling. It's easier to backup,
> works with NFSv3, and works well for small to medium clusters.
>
> But the best way "to avoid something like this" is to backup your
> $SGE_ROOT peroidically!!
>
> Rayson
>
>
>
> On 10/19/08, Mag Gam <magawake at gmail.com> wrote:
>> To avoid something like this, what is the recommeded method to store
>> data? Database or flatfile?
>>
>>
>>
>> On Fri, Oct 17, 2008 at 2:15 PM, Rayson Ho <rayrayson at gmail.com> wrote:
>> > Should be available from the "spooling_method" parameter -- see the
>> > bootstrap(5) manpage.
>> >
>> > No matter what the spooling method is used, looks like your qmaster
>> > machine crashed recently, and you will need to fix the corrupted
>> > configuration files.
>> >
>> > Rayson
>> >
>> >
>> > On 10/17/08, Mag Gam <magawake at gmail.com> wrote:
>> >> I don't know...how can I check?
>> >>
>> >> On Fri, Oct 17, 2008 at 2:00 PM, Rayson Ho <rayrayson at gmail.com> wrote:
>> >> > On 10/17/08, Mag Gam <magawake at gmail.com> wrote:
>> >> >> 10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|E|cqueue_list_locate_qinstance("(null)@(null)"):
>> >> >> cqueue == NULL("(null)", "(null)", 1, 0
>> >> >> 10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|E|writing job
>> >> >> finish information: can't locate queue "(null)@(null)"
>> >> >
>> >> > Looks like your SGE configuration is corrupted! Are you using Berkeley
>> >> > DB spooling or classic spooling??
>> >> >
>> >> > Rayson
>> >> >
>> >> >
>> >> >
>> >> >> 10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|W|job 5014.1
>> >> >> failed on host <unknown host> before writing exit_status because:
>> >> >> shepherd exited with exit status 19
>> >> >> 10/17/2008 13:53:53|qmaster|master01.engrMec.unc.edu|C|!!!!!!!!!! got
>> >> >> NULL element for QU_rerun !!!!!!!!!!
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Fri, Oct 17, 2008 at 1:32 PM, Rayson Ho <rayrayson at gmail.com> wrote:
>> >> >> > Looks like a network resolution/connection problem... Are you abkle to
>> >> >> > connect to the master from the command line, like:
>> >> >> >
>> >> >> > % telnet master01.engrMec.unc.edu 536
>> >> >> >
>> >> >> > Rayson
>> >> >> >
>> >> >> >
>> >> >> > On 10/17/08, Mag Gam <magawake at gmail.com> wrote:
>> >> >> >> I have the sgemaster running on our head node and the on the clients I
>> >> >> >> am able to start up sge_execd
>> >> >> >>
>> >> >> >> I see sge_execd process running on the client.
>> >> >> >>
>> >> >> >> But when I do
>> >> >> >>
>> >> >> >> $ qhost
>> >> >> >> error: commlib error: can't connect to service (Connection refused)
>> >> >> >> error: unable to contact qmaster using port 536 on host
>> >> >> >> "master01.engrMec.unc.edu"
>> >> >> >>
>> >> >> >>
>> >> >> >> When I start up the client I see no changed in the messages file
>> >> >> >> either. Has anyone seen this before? Using, GE 6.1u5
>> >> >> >>
>> >> >> >> TIA
>> >> >> >>
>> >> >> >> ---------------------------------------------------------------------
>> >> >> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> >> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> > ---------------------------------------------------------------------
>> >> >> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> >> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >>
>> >> >>
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >>
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list