[GE users] Waking up hosts when needed

opoplawski orion at cora.nwra.com
Thu Feb 19 00:07:43 GMT 2009


brs wrote:
> Orion,
> 
> I've been pretty interested in this type of approach to cut electrical 
> costs.  Looking at the code, it does not appear that it references any 
> host or queue complexes in order to make a determination for which hosts 
> to bring up.  What would be even better (more reliable) than using a WOL 
> magic packet a la ether-wake would be to use IPMI instead (most of our 
> nodes support it, thankfully).  Perhaps implementing complex attributes 
> that include this information and referencing them from the script would 
> be nice.

Nope, we have a hardcoded list of MAC addrs in  our wakeup script.  I 
suppose you could pull a MAC addr out of a host configuration.  I think 
most "complexes" are unavailable though when a host is down.  Or perhaps 
I'm mis-understanding.

I don't have IPMI, so I don't use it.  But "wakeup" could be implement 
however you want.

> Say we have a batch 12 nodes, 6 of which have 8GB or memory and are 
> GigE-connected and we have 6 other nodes with 16GB of RAM connected with 
> IB.  Let's say that your job requests h_vmem (or something) for 12GB and 
> requests ib=true (your complex attribute for infiniband).  Obviously, it 
> would be undesirable for the script to start up any of the 8GB, GigE 
> nodes as they would not correspond with the requirements of the job(s) 
> that is(are) waiting.  You could also have some "hidden" complexes that 
> include information like ipmi=true or wol=true so that all configuration 
> is centralized in SGE (so we don't have to maintain a separate 
> configuration).  This would allow the script to determine the best 
> method for booting or powering off a host based on the configuration in SGE.

Yeah, haven't tried to tackle that - haven't really needed to in our setup.

> I'd be interested in working on a project to implement this kind of 
> behavior, but I though hedeby was supposed to facilitate this?

I got tired of waiting.  Also, I thought that it would actually be a 
while before hedeby could handle "offline" resources as opposed to 
moving running resources.  Would be good to find out plans/schedule 
before investing any more "smarts" into my script.


-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion at cora.nwra.com
Boulder, CO 80301              http://www.cora.nwra.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=109343

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list