[GE users] Can't get SGE 6.1u5 to work on Linux/PPC64

Nick Tan nick at wehi.EDU.AU
Thu Sep 25 23:21:59 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

doing qhost shows:

bionode01               lx24-amd64      8  0.00    7.8G  122.9M    2.0G 
     0.0
bionode34               -               -     -       -       -       - 
       -

where bionode01 is one an x86_64 node which is working and bionode34 is 
a ppc64 node which isn't working.

Nick

Rayson Ho wrote:
> On 9/25/08, Nick Tan <nick at wehi.edu.au> wrote:
>> It looks like it can collect the data so would that indicate a
>> communication error then?
> 
> What does qhost show??
> 
> Rayson
> 
> 
>> Thanks,
>>
>> Nick
>>
>>
>> Chris Dagdigian wrote:
>>> Hi Nick,
>>>
>>> I'm guessing that maybe the PDC part of SGE on your ppc systems is unable
>> to poll the apple nodes to get load and state status.
>>> Can you try the following?
>>>
>>> Run the utilbin/loadcheck program on your PPC systems and see what comes
>> back?
>>> Running it on my OS X intel macbook pro returns:
>>>
>>>
>>>> $ /opt/sge/utilbin/darwin-x86/loadcheck
>>>> arch            darwin-x86
>>>> num_proc        2
>>>> load_short      1.35
>>>> load_medium     1.37
>>>> load_long       1.39
>>>> mem_free        2044.082031M
>>>> swap_free       0.000000M
>>>> virtual_free    2044.082031M
>>>> mem_total       4096.000000M
>>>> swap_total      0.000000M
>>>> virtual_total   4096.000000M
>>>> mem_used        2051.917969M
>>>> swap_used       0.000000M
>>>> virtual_used    2051.917969M
>>>> cpu             45.5%
>>>>
>>>
>>> If you can't find the equiv for your PPC/Linux setup then I think that may
>> be the issue (SGE is running but can't collect local performance data)
>>> Regards,
>>> Chris
>>>
>>>
>>>
>>>
>>> On Sep 25, 2008, at 2:26 AM, Nick Tan wrote:
>>>
>>>
>>>> Hi all,
>>>>
>>>> I am setting up a cluster with 33 nodes running Linux on x86_64 (SunFire
>> X2100) and 40 nodes running Linux on ppc64 (Apple Xserve G5 cluster node).
>>>> I am using the precompiled SGE binaries for the x86_64 nodes which are
>> working fine.  I have compiled SGE for the PPC64 nodes.  The x86_64 nodes
>> are running CentOS 5.2 and the PPC64 nodes are running Fedora 9.
>>>> sge_execd starts on the ppc64 node but I get this in the "qstat -f
>> -explain a" output
>>>> all.q at bionode34.biocluster     BIP   0/1       -NA-     -NA-          a
>>>>       error: no complex attribute for threshold np_load_avg
>>>>
>>>> What can I do to fix this?  I've searched the mailing list archives but
>> couldn't find anything so I'm hoping someone will be able to help.
>>>> Thanks,
>>>>
>>>> Nick
>>>>
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>>>
>> --
>> Nick Tan
>> Unix Systems Manager
>> The Walter and Eliza Hall Institute
>> nick at wehi.edu.au
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Nick Tan
Unix Systems Manager
The Walter and Eliza Hall Institute
nick at wehi.edu.au

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list