[GE users] Can't get SGE 6.1u5 to work on Linux/PPC64

Ron Chen ron_chen_123 at yahoo.com
Thu Sep 25 22:53:29 BST 2008


Are all machines running exactly the same version, namely SGE 6.1u5?

 -Ron


--- On Fri, 9/26/08, Nick Tan <nick at wehi.EDU.AU> wrote:
> I ran utilbin/loadcheck and got this:
> 
> arch            lx26-ppc64
> num_proc        2
> load_short      0.00
> load_medium     0.00
> load_long       0.00
> mem_free        4130.062500M
> swap_free       2047.992188M
> virtual_free    6178.054688M
> mem_total       4363.109375M
> swap_total      2047.992188M
> virtual_total   6411.101562M
> mem_used        233.046875M
> swap_used       0.000000M
> virtual_used    233.046875M
> cpu             0.3%
> 
> It looks like it can collect the data so would that
> indicate a
> communication error then?
> 
> Thanks,
> 
> Nick
> 
> Chris Dagdigian wrote:
> > Hi Nick,
> > 
> > I'm guessing that maybe the PDC part of SGE on
> your ppc systems is 
> > unable to poll the apple nodes to get load and state
> status.
> > 
> > Can you try the following?
> > 
> > Run the utilbin/loadcheck program on your PPC systems
> and see what comes 
> > back?
> > 
> > Running it on my OS X intel macbook pro returns:
> > 
> >> $ /opt/sge/utilbin/darwin-x86/loadcheck
> >> arch            darwin-x86
> >> num_proc        2
> >> load_short      1.35
> >> load_medium     1.37
> >> load_long       1.39
> >> mem_free        2044.082031M
> >> swap_free       0.000000M
> >> virtual_free    2044.082031M
> >> mem_total       4096.000000M
> >> swap_total      0.000000M
> >> virtual_total   4096.000000M
> >> mem_used        2051.917969M
> >> swap_used       0.000000M
> >> virtual_used    2051.917969M
> >> cpu             45.5%
> > 
> > 
> > If you can't find the equiv for your PPC/Linux
> setup then I think that 
> > may be the issue (SGE is running but can't collect
> local performance data)
> > 
> > Regards,
> > Chris
> > 
> > 
> > 
> > 
> > On Sep 25, 2008, at 2:26 AM, Nick Tan wrote:
> > 
> >> Hi all,
> >>
> >> I am setting up a cluster with 33 nodes running
> Linux on x86_64 
> >> (SunFire X2100) and 40 nodes running Linux on
> ppc64 (Apple Xserve G5 
> >> cluster node).
> >>
> >> I am using the precompiled SGE binaries for the
> x86_64 nodes which are 
> >> working fine.  I have compiled SGE for the PPC64
> nodes.  The x86_64 
> >> nodes are running CentOS 5.2 and the PPC64 nodes
> are running Fedora 9.
> >>
> >> sge_execd starts on the ppc64 node but I get this
> in the "qstat -f 
> >> -explain a" output
> >>
> >> all.q at bionode34.biocluster     BIP   0/1      
> -NA-     -NA-          a
> >>        error: no complex attribute for threshold
> np_load_avg
> >>
> >> What can I do to fix this?  I've searched the
> mailing list archives 
> >> but couldn't find anything so I'm hoping
> someone will be able to help.
> >>
> >> Thanks,
> >>
> >> Nick
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> > 
> 
> -- 
> Nick Tan
> Unix Systems Manager
> The Walter and Eliza Hall Institute
> nick at wehi.edu.au
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net


      

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list