[GE users] Waking up hosts when needed

brs brs at usf.edu
Wed Feb 18 21:24:46 GMT 2009


I've been pretty interested in this type of approach to cut electrical 
costs.  Looking at the code, it does not appear that it references any 
host or queue complexes in order to make a determination for which hosts 
to bring up.  What would be even better (more reliable) than using a WOL 
magic packet a la ether-wake would be to use IPMI instead (most of our 
nodes support it, thankfully).  Perhaps implementing complex attributes 
that include this information and referencing them from the script would 
be nice.

Say we have a batch 12 nodes, 6 of which have 8GB or memory and are 
GigE-connected and we have 6 other nodes with 16GB of RAM connected with 
IB.  Let's say that your job requests h_vmem (or something) for 12GB and 
requests ib=true (your complex attribute for infiniband).  Obviously, it 
would be undesirable for the script to start up any of the 8GB, GigE 
nodes as they would not correspond with the requirements of the job(s) 
that is(are) waiting.  You could also have some "hidden" complexes that 
include information like ipmi=true or wol=true so that all configuration 
is centralized in SGE (so we don't have to maintain a separate 
configuration).  This would allow the script to determine the best 
method for booting or powering off a host based on the configuration in SGE.

I'd be interested in working on a project to implement this kind of 
behavior, but I though hedeby was supposed to facilitate this?


opoplawski wrote:
> This is a script that I run here to wake up compute machines when 
> needed.  We have our nodes configured to power off when not in use and 
> this script wakes them up when there are waiting jobs.  Currently on 
> each run it only wakes up one machine at a time.  It is run every 2 minutes.
> The only other needed piece is the "wakeup" script.  This basically has 
> a list of MAC addrs for the hostnames and run ether-wake to wake the 
> appropriate machine.
> Hopefully someone else will find it useful.

Brian Smith
Sr. HPC Systems Administrator
Research Computing, University of South Florida
4202 E. Fowler Ave. ENB308 
Office Phone: +1 813 974-1467
Organization URL: http://rc.usf.edu


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list