[GE users] Can't get SGE 6.1u5 to work on Linux/PPC64

Chris Dagdigian dag at sonsorol.org
Thu Sep 25 22:55:25 BST 2008


Strange.

I've never seen that "error: no complex attribute for threshold  
np_load_avg" error before in qstat output.


I don't think it is a communication problem. If that was the case your  
queue instances would be state "au" with the (u) meaning  
'unreachable'. Also when SGE does not hear from a compute node after a  
timeout threshold than those "-NA-" fields should go back to the  
builtin safety default of "99.99".

We may have to wait on someone more familiar with the source code to  
divine root causes behind that error message, if that fails the sge  
dev list may have more insight.

-Chris



On Sep 25, 2008, at 5:45 PM, Nick Tan wrote:

> Hi Chris,
>
> I ran utilbin/loadcheck and got this:
>
> arch            lx26-ppc64
> num_proc        2
> load_short      0.00
> load_medium     0.00
> load_long       0.00
> mem_free        4130.062500M
> swap_free       2047.992188M
> virtual_free    6178.054688M
> mem_total       4363.109375M
> swap_total      2047.992188M
> virtual_total   6411.101562M
> mem_used        233.046875M
> swap_used       0.000000M
> virtual_used    233.046875M
> cpu             0.3%
>
> It looks like it can collect the data so would that indicate a
> communication error then?
>
> Thanks,
>
> Nick
>
> Chris Dagdigian wrote:
>> Hi Nick,
>> I'm guessing that maybe the PDC part of SGE on your ppc systems is  
>> unable to poll the apple nodes to get load and state status.
>> Can you try the following?
>> Run the utilbin/loadcheck program on your PPC systems and see what  
>> comes back?
>> Running it on my OS X intel macbook pro returns:
>>> $ /opt/sge/utilbin/darwin-x86/loadcheck
>>> arch            darwin-x86
>>> num_proc        2
>>> load_short      1.35
>>> load_medium     1.37
>>> load_long       1.39
>>> mem_free        2044.082031M
>>> swap_free       0.000000M
>>> virtual_free    2044.082031M
>>> mem_total       4096.000000M
>>> swap_total      0.000000M
>>> virtual_total   4096.000000M
>>> mem_used        2051.917969M
>>> swap_used       0.000000M
>>> virtual_used    2051.917969M
>>> cpu             45.5%
>> If you can't find the equiv for your PPC/Linux setup then I think  
>> that may be the issue (SGE is running but can't collect local  
>> performance data)
>> Regards,
>> Chris
>> On Sep 25, 2008, at 2:26 AM, Nick Tan wrote:
>>> Hi all,
>>>
>>> I am setting up a cluster with 33 nodes running Linux on x86_64  
>>> (SunFire X2100) and 40 nodes running Linux on ppc64 (Apple Xserve  
>>> G5 cluster node).
>>>
>>> I am using the precompiled SGE binaries for the x86_64 nodes which  
>>> are working fine.  I have compiled SGE for the PPC64 nodes.  The  
>>> x86_64 nodes are running CentOS 5.2 and the PPC64 nodes are  
>>> running Fedora 9.
>>>
>>> sge_execd starts on the ppc64 node but I get this in the "qstat -f  
>>> -explain a" output
>>>
>>> all.q at bionode34.biocluster     BIP   0/1       -NA-     - 
>>> NA-          a
>>>       error: no complex attribute for threshold np_load_avg
>>>
>>> What can I do to fix this?  I've searched the mailing list  
>>> archives but couldn't find anything so I'm hoping someone will be  
>>> able to help.
>>>
>>> Thanks,
>>>
>>> Nick
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> -- 
> Nick Tan
> Unix Systems Manager
> The Walter and Eliza Hall Institute
> nick at wehi.edu.au
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list