[GE users] Waking up hosts when needed
orion at cora.nwra.com
Thu Feb 19 00:07:43 GMT 2009
> I've been pretty interested in this type of approach to cut electrical
> costs. Looking at the code, it does not appear that it references any
> host or queue complexes in order to make a determination for which hosts
> to bring up. What would be even better (more reliable) than using a WOL
> magic packet a la ether-wake would be to use IPMI instead (most of our
> nodes support it, thankfully). Perhaps implementing complex attributes
> that include this information and referencing them from the script would
> be nice.
Nope, we have a hardcoded list of MAC addrs in our wakeup script. I
suppose you could pull a MAC addr out of a host configuration. I think
most "complexes" are unavailable though when a host is down. Or perhaps
I don't have IPMI, so I don't use it. But "wakeup" could be implement
however you want.
> Say we have a batch 12 nodes, 6 of which have 8GB or memory and are
> GigE-connected and we have 6 other nodes with 16GB of RAM connected with
> IB. Let's say that your job requests h_vmem (or something) for 12GB and
> requests ib=true (your complex attribute for infiniband). Obviously, it
> would be undesirable for the script to start up any of the 8GB, GigE
> nodes as they would not correspond with the requirements of the job(s)
> that is(are) waiting. You could also have some "hidden" complexes that
> include information like ipmi=true or wol=true so that all configuration
> is centralized in SGE (so we don't have to maintain a separate
> configuration). This would allow the script to determine the best
> method for booting or powering off a host based on the configuration in SGE.
Yeah, haven't tried to tackle that - haven't really needed to in our setup.
> I'd be interested in working on a project to implement this kind of
> behavior, but I though hedeby was supposed to facilitate this?
I got tired of waiting. Also, I thought that it would actually be a
while before hedeby could handle "offline" resources as opposed to
moving running resources. Would be good to find out plans/schedule
before investing any more "smarts" into my script.
Technical Manager 303-415-9701 x222
NWRA/CoRA Division FAX: 303-415-9702
3380 Mitchell Lane orion at cora.nwra.com
Boulder, CO 80301 http://www.cora.nwra.com
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users