[GE users] Can't get SGE 6.1u5 to work on Linux/PPC64

Rayson Ho rayrayson at gmail.com
Fri Sep 26 03:23:03 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

On 9/25/08, Nick Tan <nick at wehi.edu.au> wrote:
> I might also try using wireshark to sniff the traffic between the node and
> the qmaster to try and figure out if there's something not right there too.
>
> I don't know of any public PPC64 servers online, sorry.

Try: http://clug.anu.edu.au/

gcc is 4.3.1, and the machine is 2-way Power5, and supports 64-bit:
tridge:~/c> gcc -m64 a.c
tridge:~/c> ./a.out
sizeof(void) = 8
tridge:~/c>

Rayson


>
>
> Nick
>
> Ron Chen wrote:
> > Then there are 2 possibilities:
> >
> > 1.) There really is a communication error (usually due to setup of
> hostname resolution) from the execution hosts to the qmaster.
> >
> > 2.) There is still a bug in the 64-bit code, as 32-bit worked fine before:
> >
> http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=2151
> >
> > As a hack, you can change the arch script to make it think that it's
> executing on a 32-bit machine. Then, aimk will compile SGE in pure 32-bit.
> >
> > BTW, do you know if there are any public PPC64 compile farms or servers
> available online? If I have time, I may be able to test the PPC64 Linux
> port.
> >
> >  -Ron
> >
> >
> > --- On Fri, 9/26/08, Nick Tan <nick at wehi.EDU.AU> wrote:
> >
> > > I've done as you suggested and recompiled but I am
> > > seeing the same behaviour as before.
> > >
> > > Nick
> > >
> > > Ron Chen wrote:
> > >
> > > > Then it really looks like a communication problem.
> > > >
> > > qhost is really basic (with no complex settings or other
> > > kinds of setup needed).
> > >
> > > > As you mentioned that TARGET_64BIT is defined, I
> > > >
> > > greped the source and found that there is a case for the
> > > LINUXAMD64 macro but not TARGET_64BIT. I am wondering if it
> > > is right or not, as AMD64 is also 64-bit?
> > >
> > > > So, one last thing that I can think of right now is in
> > > >
> > > common/basis_types.h:
> > >
> > > > #if defined(FREEBSD) || defined(NETBSD) ||
> > > >
> > > defined(LINUXAMD64)
> > >
> > > > #  define sge_U32CFormat "%u"
> > > > #  define sge_U32CLetter "u"
> > > > #  define sge_u32c(x)  (unsigned int)(x)
> > > >
> > > > #  define sge_X32CFormat "%x"
> > > > #  define sge_x32c(x)  (unsigned int)(x)
> > > > #else
> > > > ...
> > > > ...
> > > >
> > > > In the code,  add a case for "TARGET_64BIT",
> > > >
> > > like:
> > >
> > > > #if defined(FREEBSD) || defined(NETBSD) ||
> > > >
> > > defined(LINUXAMD64) ||
> > >
> > > > defined(TARGET_64BIT)
> > > >
> > > > Do an "aimk clean" (since it is a header
> > > >
> > > file, the dependency may not be able to detect that) and
> > > recompile everything.
> > >
> > > >  -Ron
> > > >
> > > >
> > > > --- On Fri, 9/26/08, Nick Tan <nick at wehi.EDU.AU>
> > > >
> > > wrote:
> > >
> > > >
> > > > > doing qhost shows:
> > > > >
> > > > > bionode01               lx24-amd64      8  0.00
> > > > >
> > > >
> > > 7.8G
> > >
> > > >
> > > > > 122.9M    2.0G     0.0
> > > > > bionode34               -               -     -
> > > > >
> > > >
> > >   -
> > >
> > > >
> > > > >  -       -       -
> > > > >
> > > > > where bionode01 is one an x86_64 node which is
> > > > >
> > > >
> > > working and
> > >
> > > >
> > > > > bionode34 is a ppc64 node which isn't working.
> > > > >
> > > > > Nick
> > > > >
> > > > > Rayson Ho wrote:
> > > > >
> > > > > > On 9/25/08, Nick Tan <nick at wehi.edu.au>
> > > > > >
> > > > >
> > > >
> > > wrote:
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > It looks like it can collect the data so
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > would
> > >
> > > >
> > > > > that indicate a
> > > > >
> > > > > >
> > > > > > > communication error then?
> > > > > > >
> > > > > > What does qhost show??
> > > > > >
> > > > > > Rayson
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Nick
> > > > > > >
> > > > > > >
> > > > > > > Chris Dagdigian wrote:
> > > > > > >
> > > > > > > > Hi Nick,
> > > > > > > >
> > > > > > > > I'm guessing that maybe the PDC
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > part of
> > >
> > > >
> > > > > SGE on your ppc systems is unable
> > > > >
> > > > > >
> > > > > > > to poll the apple nodes to get load and
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > state
> > >
> > > >
> > > > > status.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > Can you try the following?
> > > > > > > >
> > > > > > > > Run the utilbin/loadcheck program on
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > your PPC
> > >
> > > >
> > > > > systems and see what comes
> > > > >
> > > > > >
> > > > > > > back?
> > > > > > >
> > > > > > > > Running it on my OS X intel macbook
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > pro
> > >
> > > >
> > > > > returns:
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > $
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > /opt/sge/utilbin/darwin-x86/loadcheck
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > arch            darwin-x86
> > > > > > > > > num_proc        2
> > > > > > > > > load_short      1.35
> > > > > > > > > load_medium     1.37
> > > > > > > > > load_long       1.39
> > > > > > > > > mem_free        2044.082031M
> > > > > > > > > swap_free       0.000000M
> > > > > > > > > virtual_free    2044.082031M
> > > > > > > > > mem_total       4096.000000M
> > > > > > > > > swap_total      0.000000M
> > > > > > > > > virtual_total   4096.000000M
> > > > > > > > > mem_used        2051.917969M
> > > > > > > > > swap_used       0.000000M
> > > > > > > > > virtual_used    2051.917969M
> > > > > > > > > cpu             45.5%
> > > > > > > > >
> > > > > > > > >
> > > > > > > > If you can't find the equiv for
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > your
> > >
> > > >
> > > > > PPC/Linux setup then I think that may
> > > > >
> > > > > >
> > > > > > > be the issue (SGE is running but can't
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > collect
> > >
> > > >
> > > > > local performance data)
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > Regards,
> > > > > > > > Chris
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sep 25, 2008, at 2:26 AM, Nick Tan
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > wrote:
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I am setting up a cluster with 33
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > nodes
> > >
> > > >
> > > > > running Linux on x86_64 (SunFire
> > > > >
> > > > > >
> > > > > > > X2100) and 40 nodes running Linux on ppc64
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > (Apple
> > >
> > > >
> > > > > Xserve G5 cluster node).
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > I am using the precompiled SGE
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > binaries
> > >
> > > >
> > > > > for the x86_64 nodes which are
> > > > >
> > > > > >
> > > > > > > working fine.  I have compiled SGE for the
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > PPC64
> > >
> > > >
> > > > > nodes.  The x86_64 nodes
> > > > >
> > > > > >
> > > > > > > are running CentOS 5.2 and the PPC64 nodes
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > are
> > >
> > > >
> > > > > running Fedora 9.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > sge_execd starts on the ppc64 node
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > but I
> > >
> > > >
> > > > > get this in the "qstat -f
> > > > >
> > > > > >
> > > > > > > -explain a" output
> > > > > > >
> > > > > > > >
> > > > > > > > > all.q at bionode34.biocluster     BIP
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >  0/1
> > >
> > > >
> > > > >    -NA-     -NA-          a
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >      error: no complex attribute
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > for
> > >
> > > >
> > > > > threshold np_load_avg
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > What can I do to fix this?
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > I've
> > >
> > > >
> > > > > searched the mailing list archives but
> > > > >
> > > > > >
> > > > > > > couldn't find anything so I'm
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > hoping
> > >
> > > >
> > > > > someone will be able to help.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Nick
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > To unsubscribe, e-mail:
> > > > > > > >
> > > > > > > users-unsubscribe at gridengine.sunsource.net
> > > > > > >
> > > > > > > > For additional commands, e-mail:
> > > > > > > >
> > > > > > > users-help at gridengine.sunsource.net
> > > > > > > --
> > > > > > > Nick Tan
> > > > > > > Unix Systems Manager
> > > > > > > The Walter and Eliza Hall Institute
> > > > > > > nick at wehi.edu.au
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > To unsubscribe, e-mail:
> > > > > > > users-unsubscribe at gridengine.sunsource.net
> > > > > > > For additional commands, e-mail:
> > > > > > > users-help at gridengine.sunsource.net
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > >
> > > >
> > > > >
> > > > > > To unsubscribe, e-mail:
> > > > > >
> > > > > users-unsubscribe at gridengine.sunsource.net
> > > > >
> > > > > > For additional commands, e-mail:
> > > > > >
> > > > > users-help at gridengine.sunsource.net
> > > > > --
> > > > > Nick Tan
> > > > > Unix Systems Manager
> > > > > The Walter and Eliza Hall Institute
> > > > > nick at wehi.edu.au
> > > > >
> > > > >
> > > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > >
> > > >
> > > > > To unsubscribe, e-mail:
> > > > > users-unsubscribe at gridengine.sunsource.net
> > > > > For additional commands, e-mail:
> > > > > users-help at gridengine.sunsource.net
> > > > >
> > > >
> > > >
> > > >
> > > >
> > >
> ---------------------------------------------------------------------
> > >
> > > > To unsubscribe, e-mail:
> > > >
> > > users-unsubscribe at gridengine.sunsource.net
> > >
> > > > For additional commands, e-mail:
> > > >
> > > users-help at gridengine.sunsource.net
> > > --
> > > Nick Tan
> > > Unix Systems Manager
> > > The Walter and Eliza Hall Institute
> > > nick at wehi.edu.au
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> > > users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail:
> > > users-help at gridengine.sunsource.net
> > >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> >
> >
>
> --
> Nick Tan
> Unix Systems Manager
> The Walter and Eliza Hall Institute
> nick at wehi.edu.au
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list