[GE users] Virtualization and GridEngine

Daire Byrne Daire.Byrne at framestore.com
Fri Nov 7 11:38:21 GMT 2008

    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


----- "Ignacio Martin Llorente" <llorente at dacya.ucm.es> wrote:

> > I was thinking that this could be achieved by using a specially  
> > configured queue in SGE and custom "method" scripts. The host OS  
> > could run normal execution jobs in it's normal queue and VMs could  
> > be started in the special queue - maximum total slots would be  
> > managed with an RQS. Of course I'm probably not fully understanding 
> > the finer points of VM management and simplifying matters somewhat.
> My comment is more related to the attributes required to define a VM 
> (image file attributes, network attributed, input device, booting  
> information...)

I suppose I was thinking of just doing the libvirt XML creation externally from SGE. We are not so concerned about nice GUIs and tools that we could sell on - just stuff that works. If an organisation already uses SGE to manage their compute farm then I'm interested in exploring the possibility of managing the VMs within the same system and what benefits or caveats that brings.

> >> 3  VM duration: VM runs for very long time periods (?forever?)
> >
> > Are long running jobs problematic in SGE? I understand that there is
> > a big difference between running short dynamic execution host VMs  
> > and high availability server VMs - you really need the server VMs to
> > be extremely robust. But there may be some benefits in being able to  
> > blur the lines between a compute farm and a server farm - a  
> > specialised application of hardware consolidation.
> I see that you have in mind the application of VMs to provide jobs  
> with a pre-defined execution environment.

That would most probably be the main application for us but then once you do that it is not such a leap to manage "permanent" server clusters in the same way. A GUI and associated tools then becomes more important which may or may not be more work than it is worth. In our case I can see our compute farm becoming a useful way of providing cheap development 'hardware' for software engineers to play around with (ala EC2). I doubt we are going run a permanent home server on a virtual cluster anytime soon but certainly development servers and compute farm execution hosts are good candidates.

> >> 4  VM groups (services): VMs are not independent entities. A services
> >> consist of a group of interconnected VMs (not array in the job
> >> management sense)
> >
> > I suppose acting on groups of jobs could be done with job name  
> > wildcards? So to suspend a group of VMs: qmod -sj 'servers-*'.  
> > Adding and removing VMs from the group is just an exercise in renaming jobs  
> > (qalter -N). Perhaps parallel jobs can also be used to define VM  
> > groups? In this case however I'm not sure you can dynamically add  
> > and remove jobs/VMs.a
> Notice that an example of service is a full SGE cluster, where you  
> have to create a VLAN between the running VMs, and could define an  
> ordering in the booting of the VMs.

Again I'm thinking that the libvirt XML defines your VLANs - SGE is only concerned with starting and managing the running VM. It is not a complete (GUI) management solution for sure but can provide all the underlying functionality. The ordering of booting sounds like a job priority.....

> > It also opens up the possibility of users running their own custom  
> > images ala cloud/EC2 computing. Using Hedeby to reboot machines into
> > different images is another approach but I like the idea that  
> > "normal" jobs and VMs can cohabit the same machine if slots are  
> > available.
> We have also used this approach at Grid level. We were involved in a 
> project to deploy a Globus based Grid for the execution of  
> applications within pre-defined environments. See "Management of  
> Virtual Machines on Globus Grids Using GridWay" at
> http://dsa-research.org/doku.php?id=publications:grid:virtualization

Interesting reading - gracias!

> > ). Again it seems "neater" to be able to use a single scheduler to  
> > manage both "levels". The ability to run VMs AND execution jobs  
> > concurrently on the same hardware has it advantages with many core 
> > machines.
> Well, it depends on what you are interested in. Option A is a good  
> solution if you are interested in virtualization as a way to assure a 
> given execution environments for the jobs.

Maybe I'm just trying to oversimplify and merge A + B into a single system. There are advantages to having both virtualisation architectures aware of each other on the grid and the easiest way to achieve that is make both use the same system.

> Thanks for your words in Spanish!

De nada - estoy aprendiendo Espanol y necesito mucha práctica!



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list