[GE users] Can't get SGE 6.1u5 to work on Linux/PPC64

Ron Chen ron_chen_123 at yahoo.com
Fri Sep 26 03:02:53 BST 2008


Then there are 2 possibilities:

1.) There really is a communication error (usually due to setup of hostname resolution) from the execution hosts to the qmaster.

2.) There is still a bug in the 64-bit code, as 32-bit worked fine before:
http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=2151

As a hack, you can change the arch script to make it think that it's executing on a 32-bit machine. Then, aimk will compile SGE in pure 32-bit.

BTW, do you know if there are any public PPC64 compile farms or servers available online? If I have time, I may be able to test the PPC64 Linux port.

 -Ron


--- On Fri, 9/26/08, Nick Tan <nick at wehi.EDU.AU> wrote:
> I've done as you suggested and recompiled but I am
> seeing the same 
> behaviour as before.
> 
> Nick
> 
> Ron Chen wrote:
> > Then it really looks like a communication problem.
> qhost is really basic (with no complex settings or other
> kinds of setup needed).
> > 
> > As you mentioned that TARGET_64BIT is defined, I
> greped the source and found that there is a case for the
> LINUXAMD64 macro but not TARGET_64BIT. I am wondering if it
> is right or not, as AMD64 is also 64-bit?
> > 
> > So, one last thing that I can think of right now is in
> common/basis_types.h:
> > 
> > #if defined(FREEBSD) || defined(NETBSD) ||
> defined(LINUXAMD64)
> > #  define sge_U32CFormat "%u"
> > #  define sge_U32CLetter "u"
> > #  define sge_u32c(x)  (unsigned int)(x)
> > 
> > #  define sge_X32CFormat "%x"
> > #  define sge_x32c(x)  (unsigned int)(x)
> > #else
> > ...
> > ...
> > 
> > In the code,  add a case for "TARGET_64BIT",
> like:
> > 
> > #if defined(FREEBSD) || defined(NETBSD) ||
> defined(LINUXAMD64) ||
> > defined(TARGET_64BIT)
> > 
> > Do an "aimk clean" (since it is a header
> file, the dependency may not be able to detect that) and
> recompile everything.
> > 
> >  -Ron
> > 
> > 
> > --- On Fri, 9/26/08, Nick Tan <nick at wehi.EDU.AU>
> wrote:
> >> doing qhost shows:
> >>
> >> bionode01               lx24-amd64      8  0.00   
> 7.8G 
> >> 122.9M    2.0G 
> >>      0.0
> >> bionode34               -               -     -   
>    -    
> >>   -       - 
> >>        -
> >>
> >> where bionode01 is one an x86_64 node which is
> working and
> >> bionode34 is 
> >> a ppc64 node which isn't working.
> >>
> >> Nick
> >>
> >> Rayson Ho wrote:
> >>> On 9/25/08, Nick Tan <nick at wehi.edu.au>
> wrote:
> >>>> It looks like it can collect the data so
> would
> >> that indicate a
> >>>> communication error then?
> >>> What does qhost show??
> >>>
> >>> Rayson
> >>>
> >>>
> >>>> Thanks,
> >>>>
> >>>> Nick
> >>>>
> >>>>
> >>>> Chris Dagdigian wrote:
> >>>>> Hi Nick,
> >>>>>
> >>>>> I'm guessing that maybe the PDC
> part of
> >> SGE on your ppc systems is unable
> >>>> to poll the apple nodes to get load and
> state
> >> status.
> >>>>> Can you try the following?
> >>>>>
> >>>>> Run the utilbin/loadcheck program on
> your PPC
> >> systems and see what comes
> >>>> back?
> >>>>> Running it on my OS X intel macbook
> pro
> >> returns:
> >>>>>
> >>>>>> $
> /opt/sge/utilbin/darwin-x86/loadcheck
> >>>>>> arch            darwin-x86
> >>>>>> num_proc        2
> >>>>>> load_short      1.35
> >>>>>> load_medium     1.37
> >>>>>> load_long       1.39
> >>>>>> mem_free        2044.082031M
> >>>>>> swap_free       0.000000M
> >>>>>> virtual_free    2044.082031M
> >>>>>> mem_total       4096.000000M
> >>>>>> swap_total      0.000000M
> >>>>>> virtual_total   4096.000000M
> >>>>>> mem_used        2051.917969M
> >>>>>> swap_used       0.000000M
> >>>>>> virtual_used    2051.917969M
> >>>>>> cpu             45.5%
> >>>>>>
> >>>>> If you can't find the equiv for
> your
> >> PPC/Linux setup then I think that may
> >>>> be the issue (SGE is running but can't
> collect
> >> local performance data)
> >>>>> Regards,
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sep 25, 2008, at 2:26 AM, Nick Tan
> wrote:
> >>>>>
> >>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I am setting up a cluster with 33
> nodes
> >> running Linux on x86_64 (SunFire
> >>>> X2100) and 40 nodes running Linux on ppc64
> (Apple
> >> Xserve G5 cluster node).
> >>>>>> I am using the precompiled SGE
> binaries
> >> for the x86_64 nodes which are
> >>>> working fine.  I have compiled SGE for the
> PPC64
> >> nodes.  The x86_64 nodes
> >>>> are running CentOS 5.2 and the PPC64 nodes
> are
> >> running Fedora 9.
> >>>>>> sge_execd starts on the ppc64 node
> but I
> >> get this in the "qstat -f
> >>>> -explain a" output
> >>>>>> all.q at bionode34.biocluster     BIP
>   0/1  
> >>     -NA-     -NA-          a
> >>>>>>       error: no complex attribute
> for
> >> threshold np_load_avg
> >>>>>> What can I do to fix this? 
> I've
> >> searched the mailing list archives but
> >>>> couldn't find anything so I'm
> hoping
> >> someone will be able to help.
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Nick
> >>>>>>
> >>>>>
> >>
> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail:
> >>>> users-unsubscribe at gridengine.sunsource.net
> >>>>> For additional commands, e-mail:
> >>>> users-help at gridengine.sunsource.net
> >>>> --
> >>>> Nick Tan
> >>>> Unix Systems Manager
> >>>> The Walter and Eliza Hall Institute
> >>>> nick at wehi.edu.au
> >>>>
> >>>>
> >>>>
> >>
> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail:
> >>>> users-unsubscribe at gridengine.sunsource.net
> >>>> For additional commands, e-mail:
> >>>> users-help at gridengine.sunsource.net
> >>>>
> >>>>
> >>>
> >>
> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail:
> >> users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail:
> >> users-help at gridengine.sunsource.net
> >> -- 
> >> Nick Tan
> >> Unix Systems Manager
> >> The Walter and Eliza Hall Institute
> >> nick at wehi.edu.au
> >>
> >>
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail:
> >> users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail:
> >> users-help at gridengine.sunsource.net
> > 
> > 
> >       
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> > 
> 
> -- 
> Nick Tan
> Unix Systems Manager
> The Walter and Eliza Hall Institute
> nick at wehi.edu.au
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net


      

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list