[GE users] Waking up hosts when needed

rayson rayrayson at gmail.com
Thu Feb 19 00:25:21 GMT 2009


There are scripts that power on and off nodes in a Grid Engine
cluster, and they use IPMI (via the Python IPMI module):


But of course the decision does not take resource requirements into
consideraton when picking which nodes to power on/off...


On 2/18/09, brs <brs at usf.edu> wrote:
> Orion,
> I've been pretty interested in this type of approach to cut electrical
> costs.  Looking at the code, it does not appear that it references any
> host or queue complexes in order to make a determination for which hosts
> to bring up.  What would be even better (more reliable) than using a WOL
> magic packet a la ether-wake would be to use IPMI instead (most of our
> nodes support it, thankfully).  Perhaps implementing complex attributes
> that include this information and referencing them from the script would
> be nice.
> Say we have a batch 12 nodes, 6 of which have 8GB or memory and are
> GigE-connected and we have 6 other nodes with 16GB of RAM connected with
> IB.  Let's say that your job requests h_vmem (or something) for 12GB and
> requests ib=true (your complex attribute for infiniband).  Obviously, it
> would be undesirable for the script to start up any of the 8GB, GigE
> nodes as they would not correspond with the requirements of the job(s)
> that is(are) waiting.  You could also have some "hidden" complexes that
> include information like ipmi=true or wol=true so that all configuration
> is centralized in SGE (so we don't have to maintain a separate
> configuration).  This would allow the script to determine the best
> method for booting or powering off a host based on the configuration in SGE.
> I'd be interested in working on a project to implement this kind of
> behavior, but I though hedeby was supposed to facilitate this?
> -Brian
> opoplawski wrote:
> > This is a script that I run here to wake up compute machines when
> > needed.  We have our nodes configured to power off when not in use and
> > this script wakes them up when there are waiting jobs.  Currently on
> > each run it only wakes up one machine at a time.  It is run every 2 minutes.
> >
> > The only other needed piece is the "wakeup" script.  This basically has
> > a list of MAC addrs for the hostnames and run ether-wake to wake the
> > appropriate machine.
> >
> > Hopefully someone else will find it useful.
> >
> >
> --
> Brian Smith
> Sr. HPC Systems Administrator
> Research Computing, University of South Florida
> 4202 E. Fowler Ave. ENB308
> Office Phone: +1 813 974-1467
> Organization URL: http://rc.usf.edu
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=109246
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list