[GE users] Virtualization and GridEngine

Daire Byrne Daire.Byrne at framestore.com
Thu Nov 6 15:56:37 GMT 2008

    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


Hi! Interesting to hear you looked at SGE before working on OpenNebula (BTW me lo gusta - es muy interesante!).

----- "Ignacio Martin Llorente" <llorente at dacya.ucm.es> wrote:

> 1. VM structure: The definition of a VM requires a special treatment:  
> images with fixed and variable parts for migration, contextualization 
> parameters...

I was thinking that this could be achieved by using a specially configured queue in SGE and custom "method" scripts. The host OS could run normal execution jobs in it's normal queue and VMs could be started in the special queue - maximum total slots would be managed with an RQS. Of course I'm probably not fully understanding the finer points of VM management and simplifying matters somewhat.

> 2  VM life-cycle: VM management requires fixed and transient states  
> for contextualization, live migration?

I agree the live migration example is not easy in SGE. However it may be possible to consider it as a "loose" integration problem. To migrate to a new machine start a new job on that machine which then sucks the job from the old machine. Maybe the old machine is constantly checking the status of the running VM and automatically quits once it detects the VM has been migrated - or something.

> 3  VM duration: VM runs for very long time periods (?forever?)

Are long running jobs problematic in SGE? I understand that there is a big difference between running short dynamic execution host VMs and high availability server VMs - you really need the server VMs to be extremely robust. But there may be some benefits in being able to blur the lines between a compute farm and a server farm - a specialised application of hardware consolidation.

> 4  VM groups (services): VMs are not independent entities. A services 
> consist of a group of interconnected VMs (not array in the job  
> management sense)

I suppose acting on groups of jobs could be done with job name wildcards? So to suspend a group of VMs: qmod -sj 'servers-*'. Adding and removing VMs from the group is just an exercise in renaming jobs (qalter -N). Perhaps parallel jobs can also be used to define VM groups? In this case however I'm not sure you can dynamically add and remove jobs/VMs.

> 5  VM elasticity: Groups of VMs can grow to satisfy a given SLO, or  
> even you could dynamically update the memory os CPU requirements of a 
> running VM

I agree this might be hard to do satisfactorily. Once a job starts running in SGE changing it's resource requirements doesn't have any effect. You would also have to suspend/restart or migrate the job for it to take effect.

> 6  Finally, the aim of the scheduling heuristics is different. While 
> in job management we try to optimize performance criteria such as  
> turnaround time, throughput?; in VM management, we focus on capacity 
> provision, for example
> probability of SLA violation for a given cost of provisioning  
> including support for server consolidation, partitioning?

I would have thought some custom heuristics could be added to execution hosts via "load_sensors" but I do see your point. I have obviously not given the subject of SLAs the kind of thought that you have during the development of OpenNebula. I also have no experience of Hedeby and what it is capable of in this respect.

> That does not mean that virtualization can not be integrated with job 
> managers. I know two approaches:
> A. VMs to Provide pre-Created Software Environments for Jobs

> As described by Andreas, some job managers provide extensions of job  
> execution managers to create per-job basis VMs so as to provide a pre-
> defined environment for job execution. Those approaches still manage 
> jobs and the VMs are bounded to a given host and only exist during job
> execution.

I think this is probably more the usage I would be interested in. If we have legacy applications that need to be run on the farm from time to time it would be useful to start the (old) OS image and using a job dependency with a hard job resource request it should be possible to start a VM which also runs and SGE client within it (or use ssh as Andreas suggested). Despite the long startup time of the VM and the reduced network performance this could be a neat way to automatically and dynamically provision the OS running on the farm. It also opens up the possibility of users running their own custom images ala cloud/EC2 computing. Using Hedeby to reboot machines into different images is another approach but I like the idea that "normal" jobs and VMs can cohabit the same machine if slots are available.
> B. Job Managers on top of a Virtualized infrastructure
> A SGE cluster service can run on top of a virtual infrastructure,  
> managed for example by OpenNebula, see:
> http://gridgurus.typepad.com/grid_gurus/2008/10/elastic-managem.html
> Notice that this approach provides a full separation between the  
> service and the infrastructure. In other words, you run two  
> independent managers: the job manager and the VM manager. In addition,  
> you could add a new manager, the service manager. For example Hedeby
> (http://hedeby.sunsource.net/ 
> ) could be used to request new virtual worker nodes on demand when the
> number of pending jobs exceed a given threshold.

I have considered this option too - there was an article recently about using SGE in an Amazon EC2 cluster using Hedeby (as you suggest) which was quite interesting (http://wiki.gridengine.info/wiki/index.php/SGE-Hedeby-And-Amazon-EC2). Again it seems "neater" to be able to use a single scheduler to manage both "levels". The ability to run VMs AND execution jobs concurrently on the same hardware has it advantages with many core machines.

Muchas gracias por su respuesta y buena suerte con OpenNebula!



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list