[GE users] Virtualization and GridEngine
Ignacio Martin Llorente
llorente at dacya.ucm.es
Thu Nov 6 22:39:58 GMT 2008
[ The following text is in the "UTF-8" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some characters may be displayed incorrectly. ]
> ----- "Ignacio Martin Llorente" <llorente at dacya.ucm.es> wrote:
>> 1. VM structure: The definition of a VM requires a special treatment:
>> images with fixed and variable parts for migration, contextualization
> I was thinking that this could be achieved by using a specially
> configured queue in SGE and custom "method" scripts. The host OS
> could run normal execution jobs in it's normal queue and VMs could
> be started in the special queue - maximum total slots would be
> managed with an RQS. Of course I'm probably not fully understanding
> the finer points of VM management and simplifying matters somewhat.
My comment is more related to the attributes required to define a VM
(image file attributes, network attributed, input device, booting
>> 2 VM life-cycle: VM management requires fixed and transient states
>> for contextualization, live migration?
> I agree the live migration example is not easy in SGE. However it
> may be possible to consider it as a "loose" integration problem. To
> migrate to a new machine start a new job on that machine which then
> sucks the job from the old machine. Maybe the old machine is
> constantly checking the status of the running VM and automatically
> quits once it detects the VM has been migrated - or something.
>> 3 VM duration: VM runs for very long time periods (?forever?)
> Are long running jobs problematic in SGE? I understand that there is
> a big difference between running short dynamic execution host VMs
> and high availability server VMs - you really need the server VMs to
> be extremely robust. But there may be some benefits in being able to
> blur the lines between a compute farm and a server farm - a
> specialised application of hardware consolidation.
I see that you have in mind the application of VMs to provide jobs
with a pre-defined execution environment.
>> 4 VM groups (services): VMs are not independent entities. A services
>> consist of a group of interconnected VMs (not array in the job
>> management sense)
> I suppose acting on groups of jobs could be done with job name
> wildcards? So to suspend a group of VMs: qmod -sj 'servers-*'.
> Adding a
> nd removing VMs from the group is just an exercise in renaming jobs
> (qalter -N). Perhaps parallel jobs can also be used to define VM
> groups? In this case however I'm not sure you can dynamically add
> and remove jobs/VMs.a
Notice that an example of service is a full SGE cluster, where you
have to create a VLAN between the running VMs, and could define an
ordering in the booting of the VMs.
>> 5 VM elasticity: Groups of VMs can grow to satisfy a given SLO, or
>> even you could dynamically update the memory os CPU requirements of a
>> running VM
> I agree this might be hard to do satisfactorily. Once a job starts
> running in SGE changing it's resource requirements doesn't have any
> effect. You would also have to suspend/restart or migrate the job
> for it to take effect.
>> 6 Finally, the aim of the scheduling heuristics is different. While
>> in job management we try to optimize performance criteria such as
>> turnaround time, throughput?; in VM management, we focus on capacity
>> provision, for example
>> probability of SLA violation for a given cost of provisioning
>> including support for server consolidation, partitioning?
> I would have thought some custom heuristics could be added to
> execution hosts via "load_sensors" but I do see your point. I have
> obviously not given the subject of SLAs the kind of thought that you
> have during the development of OpenNebula. I also have no experience
> of Hedeby and what it is capable of in this respect.
>> That does not mean that virtualization can not be integrated with job
>> managers. I know two approaches:
>> A. VMs to Provide pre-Created Software Environments for Jobs
>> As described by Andreas, some job managers provide extensions of job
>> execution managers to create per-job basis VMs so as to provide a
>> defined environment for job execution. Those approaches still manage
>> jobs and the VMs are bounded to a given host and only exist during
> I think this is probably more the usage I would be interested in. If
> we have legacy applications that need to be run on the farm from
> time to time it would be useful to start the (old) OS image and
> using a job dependency with a hard job resource request it should be
> possible to start a VM which also runs and SGE client within it (or
> use ssh as Andreas suggested). Despite the long startup time of the
> VM and the reduced network performance this could be a neat way to
> automatically and dynamically provision the OS running on the farm.
> It also opens up the possibility of users running their own custom
> images ala cloud/EC2 computing. Using Hedeby to reboot machines into
> different images is another approach but I like the idea that
> "normal" jobs and VMs can cohabit the same machine if slots are
We have also used this approach at Grid level. We were involved in a
project to deploy a Globus based Grid for the execution of
applications within pre-defined environments. See "Management of
Virtual Machines on Globus Grids Using GridWay" at http://dsa-research.org/doku.php?id=publications:grid:virtualization
>> B. Job Managers on top of a Virtualized infrastructure
>> A SGE cluster service can run on top of a virtual infrastructure,
>> managed for example by OpenNebula, see:
>> Notice that this approach provides a full separation between the
>> service and the infrastructure. In other words, you run two
>> independent managers: the job manager and the VM manager. In
>> you could add a new manager, the service manager. For example Hedeby
>> ) could be used to request new virtual worker nodes on demand when
>> number of pending jobs exceed a given threshold.
> I have considered this option too - there was an article recently
> about using SGE in an Amazon EC2 cluster using Hedeby (as you
> suggest) which was quite interesting (http://wiki.gridengine.info/wiki/index.php/SGE-Hedeby-And-Amazon-EC2
> ). Again it seems "neater" to be able to use a single scheduler to
> manage both "levels". The ability to run VMs AND execution jobs
> concurrently on the same hardware has it advantages with many core
Well, it depends on what you are interested in. Option A is a good
solution if you are interested in virtualization as a way to assure a
given execution environments for the jobs.
Thanks for your words in Spanish!
> Muchas gracias por su respuesta y buena suerte con OpenNebula!
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users