[GE users] Virtualization and GridEngine

daireb Daire.Byrne at framestore.com
Mon Nov 10 18:09:33 GMT 2008


Ignacio,

Eek... two different threads for us to swap emails in!

----- "Ignacio Martin Llorente" <llorente at dacya.ucm.es> wrote:

> > In our case I can see our compute farm becoming a useful way of  
> > providing cheap development 'hardware' for software engineers to  
> > play around with (ala EC2). I doubt we are going run a permanent  
> > home server on a virtual cluster anytime soon but certainly  
> > development servers and compute farm execution hosts are good  
> > candidates.
> 
> Sure, but if you are planning to do this, Why don't you virtualize the  
> whole cluster?. So you can boot on-demand the execution hosts with the  
> required pre-configured environments, the development servers... with 
> added benefits such as consolidation, dynamic resizing...
> As alternative, if for performance reasons you want to use bare metal 
> for the execution of the jobs, you could use SGE for the management of
> the jobs, and OpenNebula for the management of the VMs in the same  
> cluster. In both managers you can specify the hosts, so can  
> dynamically allocate hosts to both managers.

Maybe one day it will be possible to simply virtualise the whole cluster. However as it stands currently there are still performance advantages to using bare metal and maximising resources. In the case of desktop machines I think it will be a while before you can virtualise them and get good performance (pci passthrough is in the works). We are interested in launching VMs dynamically on desktop machines to maximise unused cpu cycles - it may be that OpenNebula would be a better option for this. I thought it might be a smoother transition for us to allow for jobs and VMs side by side on the compute farm. We can roll out VM usage in stages without completely overhauling everything in one go.

But I do get your points.

> Job priority is used to allocate available slots. When your cluster  
> has 5 slots, the manager submits the 5 jobs with higher priority. When
> you submit a service with an ordering to start the VMs. The manager  
> can not boot a VM until the previous one has finished the booting  
> process.  In addition you need a rollback process in case one of the 
> VMs fails...

I see what you mean. I suppose there is always a way though - maybe use job dependencies? A job has to run inside the VM before the next VM (job) can start. Rollback might be done by restarting/submitting the dependency from scratch and/or using EXIT=99 to reschedule on failure. But again these are just hacks to try to replicate a proper VM manager like OpenNebula. It all comes back to whether there really are any advantages in using the same scheduler to manage VMs and jobs simultaneously..... You're starting to convince me!

> My experience is that you can get that but at the expense of  
> efficiency and functionality. I understand what you mean. We come from 
> the computing world and we originally thought that was possible.  
> However job and VM management are quite different. Now we see the  
> benefits of the decoupling between job and VM management.

I started this thread to try and get my head around the differences between job scheduling and emerging VM managers like OpenNebula. Your input has certainly helped - thanks!

Daire

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88412

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list