[GE users] Virtualization and GridEngine

Ignacio Martin Llorente llorente at dacya.ucm.es
Thu Nov 6 22:39:58 GMT 2008

    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


> ----- "Ignacio Martin Llorente" <llorente at dacya.ucm.es> wrote:
>> 1. VM structure: The definition of a VM requires a special treatment:
>> images with fixed and variable parts for migration, contextualization
>> parameters...
> I was thinking that this could be achieved by using a specially  
> configured queue in SGE and custom "method" scripts. The host OS  
> could run normal execution jobs in it's normal queue and VMs could  
> be started in the special queue - maximum total slots would be  
> managed with an RQS. Of course I'm probably not fully understanding  
> the finer points of VM management and simplifying matters somewhat.

My comment is more related to the attributes required to define a VM  
(image file attributes, network attributed, input device, booting  
>> 2  VM life-cycle: VM management requires fixed and transient states
>> for contextualization, live migration?
> I agree the live migration example is not easy in SGE. However it  
> may be possible to consider it as a "loose" integration problem. To  
> migrate to a new machine start a new job on that machine which then  
> sucks the job from the old machine. Maybe the old machine is  
> constantly checking the status of the running VM and automatically  
> quits once it detects the VM has been migrated - or something.

Good point
>> 3  VM duration: VM runs for very long time periods (?forever?)
> Are long running jobs problematic in SGE? I understand that there is  
> a big difference between running short dynamic execution host VMs  
> and high availability server VMs - you really need the server VMs to  
> be extremely robust. But there may be some benefits in being able to  
> blur the lines between a compute farm and a server farm - a  
> specialised application of hardware consolidation.

I see that you have in mind the application of VMs to provide jobs  
with a pre-defined execution environment.

>> 4  VM groups (services): VMs are not independent entities. A services
>> consist of a group of interconnected VMs (not array in the job
>> management sense)
> I suppose acting on groups of jobs could be done with job name  
> wildcards? So to suspend a group of VMs: qmod -sj 'servers-*'.  
> Adding a
> nd removing VMs from the group is just an exercise in renaming jobs  
> (qalter -N). Perhaps parallel jobs can also be used to define VM  
> groups? In this case however I'm not sure you can dynamically add  
> and remove jobs/VMs.a

Notice that an example of service is a full SGE cluster, where you  
have to create a VLAN between the running VMs, and could define an  
ordering in the booting of the VMs.

>> 5  VM elasticity: Groups of VMs can grow to satisfy a given SLO, or
>> even you could dynamically update the memory os CPU requirements of a
>> running VM
> I agree this might be hard to do satisfactorily. Once a job starts  
> running in SGE changing it's resource requirements doesn't have any  
> effect. You would also have to suspend/restart or migrate the job  
> for it to take effect.
>> 6  Finally, the aim of the scheduling heuristics is different. While
>> in job management we try to optimize performance criteria such as
>> turnaround time, throughput?; in VM management, we focus on capacity
>> provision, for example
>> probability of SLA violation for a given cost of provisioning
>> including support for server consolidation, partitioning?
> I would have thought some custom heuristics could be added to  
> execution hosts via "load_sensors" but I do see your point. I have  
> obviously not given the subject of SLAs the kind of thought that you  
> have during the development of OpenNebula. I also have no experience  
> of Hedeby and what it is capable of in this respect.
>> That does not mean that virtualization can not be integrated with job
>> managers. I know two approaches:
>> A. VMs to Provide pre-Created Software Environments for Jobs
>> As described by Andreas, some job managers provide extensions of job
>> execution managers to create per-job basis VMs so as to provide a  
>> pre-
>> defined environment for job execution. Those approaches still manage
>> jobs and the VMs are bounded to a given host and only exist during  
>> job
>> execution.
> I think this is probably more the usage I would be interested in. If  
> we have legacy applications that need to be run on the farm from  
> time to time it would be useful to start the (old) OS image and  
> using a job dependency with a hard job resource request it should be  
> possible to start a VM which also runs and SGE client within it (or  
> use ssh as Andreas suggested). Despite the long startup time of the  
> VM and the reduced network performance this could be a neat way to  
> automatically and dynamically provision the OS running on the farm.  
> It also opens up the possibility of users running their own custom  
> images ala cloud/EC2 computing. Using Hedeby to reboot machines into  
> different images is another approach but I like the idea that  
> "normal" jobs and VMs can cohabit the same machine if slots are  
> available.

We have also used this approach at Grid level. We were involved in a  
project to deploy a Globus based Grid for the execution of  
applications within pre-defined environments. See "Management of  
Virtual Machines on Globus Grids Using GridWay" at http://dsa-research.org/doku.php?id=publications:grid:virtualization
>> B. Job Managers on top of a Virtualized infrastructure
>> A SGE cluster service can run on top of a virtual infrastructure,
>> managed for example by OpenNebula, see:
>> http://gridgurus.typepad.com/grid_gurus/2008/10/elastic-managem.html
>> Notice that this approach provides a full separation between the
>> service and the infrastructure. In other words, you run two
>> independent managers: the job manager and the VM manager. In  
>> addition,
>> you could add a new manager, the service manager. For example Hedeby
>> (http://hedeby.sunsource.net/
>> ) could be used to request new virtual worker nodes on demand when  
>> the
>> number of pending jobs exceed a given threshold.
> I have considered this option too - there was an article recently  
> about using SGE in an Amazon EC2 cluster using Hedeby (as you  
> suggest) which was quite interesting (http://wiki.gridengine.info/wiki/index.php/SGE-Hedeby-And-Amazon-EC2 
> ). Again it seems "neater" to be able to use a single scheduler to  
> manage both "levels". The ability to run VMs AND execution jobs  
> concurrently on the same hardware has it advantages with many core  
> machines.

Well, it depends on what you are interested in. Option A is a good  
solution if you are interested in virtualization as a way to assure a  
given execution environments for the jobs.

Thanks for your words in Spanish!

> Muchas gracias por su respuesta y buena suerte con OpenNebula!
> Daire
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88223
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list