[GE users] Waking up hosts when needed
rayrayson at gmail.com
Thu Feb 19 00:25:21 GMT 2009
There are scripts that power on and off nodes in a Grid Engine
cluster, and they use IPMI (via the Python IPMI module):
But of course the decision does not take resource requirements into
consideraton when picking which nodes to power on/off...
On 2/18/09, brs <brs at usf.edu> wrote:
> I've been pretty interested in this type of approach to cut electrical
> costs. Looking at the code, it does not appear that it references any
> host or queue complexes in order to make a determination for which hosts
> to bring up. What would be even better (more reliable) than using a WOL
> magic packet a la ether-wake would be to use IPMI instead (most of our
> nodes support it, thankfully). Perhaps implementing complex attributes
> that include this information and referencing them from the script would
> be nice.
> Say we have a batch 12 nodes, 6 of which have 8GB or memory and are
> GigE-connected and we have 6 other nodes with 16GB of RAM connected with
> IB. Let's say that your job requests h_vmem (or something) for 12GB and
> requests ib=true (your complex attribute for infiniband). Obviously, it
> would be undesirable for the script to start up any of the 8GB, GigE
> nodes as they would not correspond with the requirements of the job(s)
> that is(are) waiting. You could also have some "hidden" complexes that
> include information like ipmi=true or wol=true so that all configuration
> is centralized in SGE (so we don't have to maintain a separate
> configuration). This would allow the script to determine the best
> method for booting or powering off a host based on the configuration in SGE.
> I'd be interested in working on a project to implement this kind of
> behavior, but I though hedeby was supposed to facilitate this?
> opoplawski wrote:
> > This is a script that I run here to wake up compute machines when
> > needed. We have our nodes configured to power off when not in use and
> > this script wakes them up when there are waiting jobs. Currently on
> > each run it only wakes up one machine at a time. It is run every 2 minutes.
> > The only other needed piece is the "wakeup" script. This basically has
> > a list of MAC addrs for the hostnames and run ether-wake to wake the
> > appropriate machine.
> > Hopefully someone else will find it useful.
> Brian Smith
> Sr. HPC Systems Administrator
> Research Computing, University of South Florida
> 4202 E. Fowler Ave. ENB308
> Office Phone: +1 813 974-1467
> Organization URL: http://rc.usf.edu
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users