[GE users] Can't get SGE 6.1u5 to work on Linux/PPC64

Nick Tan nick at wehi.EDU.AU
Fri Sep 26 03:09:54 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I will try compiling 32-bit binaries and see how it goes.  Will it 
matter if the PPC64 nodes use 32-bit binaries and the x86_64 nodes use 
64-bit binaries?

I might also try using wireshark to sniff the traffic between the node 
and the qmaster to try and figure out if there's something not right 
there too.

I don't know of any public PPC64 servers online, sorry.

Nick

Ron Chen wrote:
> Then there are 2 possibilities:
> 
> 1.) There really is a communication error (usually due to setup of hostname resolution) from the execution hosts to the qmaster.
> 
> 2.) There is still a bug in the 64-bit code, as 32-bit worked fine before:
> http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=2151
> 
> As a hack, you can change the arch script to make it think that it's executing on a 32-bit machine. Then, aimk will compile SGE in pure 32-bit.
> 
> BTW, do you know if there are any public PPC64 compile farms or servers available online? If I have time, I may be able to test the PPC64 Linux port.
> 
>  -Ron
> 
> 
> --- On Fri, 9/26/08, Nick Tan <nick at wehi.EDU.AU> wrote:
>> I've done as you suggested and recompiled but I am
>> seeing the same 
>> behaviour as before.
>>
>> Nick
>>
>> Ron Chen wrote:
>>> Then it really looks like a communication problem.
>> qhost is really basic (with no complex settings or other
>> kinds of setup needed).
>>> As you mentioned that TARGET_64BIT is defined, I
>> greped the source and found that there is a case for the
>> LINUXAMD64 macro but not TARGET_64BIT. I am wondering if it
>> is right or not, as AMD64 is also 64-bit?
>>> So, one last thing that I can think of right now is in
>> common/basis_types.h:
>>> #if defined(FREEBSD) || defined(NETBSD) ||
>> defined(LINUXAMD64)
>>> #  define sge_U32CFormat "%u"
>>> #  define sge_U32CLetter "u"
>>> #  define sge_u32c(x)  (unsigned int)(x)
>>>
>>> #  define sge_X32CFormat "%x"
>>> #  define sge_x32c(x)  (unsigned int)(x)
>>> #else
>>> ...
>>> ...
>>>
>>> In the code,  add a case for "TARGET_64BIT",
>> like:
>>> #if defined(FREEBSD) || defined(NETBSD) ||
>> defined(LINUXAMD64) ||
>>> defined(TARGET_64BIT)
>>>
>>> Do an "aimk clean" (since it is a header
>> file, the dependency may not be able to detect that) and
>> recompile everything.
>>>  -Ron
>>>
>>>
>>> --- On Fri, 9/26/08, Nick Tan <nick at wehi.EDU.AU>
>> wrote:
>>>> doing qhost shows:
>>>>
>>>> bionode01               lx24-amd64      8  0.00   
>> 7.8G 
>>>> 122.9M    2.0G 
>>>>      0.0
>>>> bionode34               -               -     -   
>>    -    
>>>>   -       - 
>>>>        -
>>>>
>>>> where bionode01 is one an x86_64 node which is
>> working and
>>>> bionode34 is 
>>>> a ppc64 node which isn't working.
>>>>
>>>> Nick
>>>>
>>>> Rayson Ho wrote:
>>>>> On 9/25/08, Nick Tan <nick at wehi.edu.au>
>> wrote:
>>>>>> It looks like it can collect the data so
>> would
>>>> that indicate a
>>>>>> communication error then?
>>>>> What does qhost show??
>>>>>
>>>>> Rayson
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>>
>>>>>> Chris Dagdigian wrote:
>>>>>>> Hi Nick,
>>>>>>>
>>>>>>> I'm guessing that maybe the PDC
>> part of
>>>> SGE on your ppc systems is unable
>>>>>> to poll the apple nodes to get load and
>> state
>>>> status.
>>>>>>> Can you try the following?
>>>>>>>
>>>>>>> Run the utilbin/loadcheck program on
>> your PPC
>>>> systems and see what comes
>>>>>> back?
>>>>>>> Running it on my OS X intel macbook
>> pro
>>>> returns:
>>>>>>>> $
>> /opt/sge/utilbin/darwin-x86/loadcheck
>>>>>>>> arch            darwin-x86
>>>>>>>> num_proc        2
>>>>>>>> load_short      1.35
>>>>>>>> load_medium     1.37
>>>>>>>> load_long       1.39
>>>>>>>> mem_free        2044.082031M
>>>>>>>> swap_free       0.000000M
>>>>>>>> virtual_free    2044.082031M
>>>>>>>> mem_total       4096.000000M
>>>>>>>> swap_total      0.000000M
>>>>>>>> virtual_total   4096.000000M
>>>>>>>> mem_used        2051.917969M
>>>>>>>> swap_used       0.000000M
>>>>>>>> virtual_used    2051.917969M
>>>>>>>> cpu             45.5%
>>>>>>>>
>>>>>>> If you can't find the equiv for
>> your
>>>> PPC/Linux setup then I think that may
>>>>>> be the issue (SGE is running but can't
>> collect
>>>> local performance data)
>>>>>>> Regards,
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sep 25, 2008, at 2:26 AM, Nick Tan
>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I am setting up a cluster with 33
>> nodes
>>>> running Linux on x86_64 (SunFire
>>>>>> X2100) and 40 nodes running Linux on ppc64
>> (Apple
>>>> Xserve G5 cluster node).
>>>>>>>> I am using the precompiled SGE
>> binaries
>>>> for the x86_64 nodes which are
>>>>>> working fine.  I have compiled SGE for the
>> PPC64
>>>> nodes.  The x86_64 nodes
>>>>>> are running CentOS 5.2 and the PPC64 nodes
>> are
>>>> running Fedora 9.
>>>>>>>> sge_execd starts on the ppc64 node
>> but I
>>>> get this in the "qstat -f
>>>>>> -explain a" output
>>>>>>>> all.q at bionode34.biocluster     BIP
>>   0/1  
>>>>     -NA-     -NA-          a
>>>>>>>>       error: no complex attribute
>> for
>>>> threshold np_load_avg
>>>>>>>> What can I do to fix this? 
>> I've
>>>> searched the mailing list archives but
>>>>>> couldn't find anything so I'm
>> hoping
>>>> someone will be able to help.
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Nick
>>>>>>>>
>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:
>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail:
>>>>>> users-help at gridengine.sunsource.net
>>>>>> --
>>>>>> Nick Tan
>>>>>> Unix Systems Manager
>>>>>> The Walter and Eliza Hall Institute
>>>>>> nick at wehi.edu.au
>>>>>>
>>>>>>
>>>>>>
>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail:
>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail:
>>>>>> users-help at gridengine.sunsource.net
>>>>>>
>>>>>>
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail:
>>>> users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail:
>>>> users-help at gridengine.sunsource.net
>>>> -- 
>>>> Nick Tan
>>>> Unix Systems Manager
>>>> The Walter and Eliza Hall Institute
>>>> nick at wehi.edu.au
>>>>
>>>>
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>>> users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail:
>>>> users-help at gridengine.sunsource.net
>>>
>>>       
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>> -- 
>> Nick Tan
>> Unix Systems Manager
>> The Walter and Eliza Hall Institute
>> nick at wehi.edu.au
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
> 
> 
>       
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Nick Tan
Unix Systems Manager
The Walter and Eliza Hall Institute
nick at wehi.edu.au

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list