[GE users] Workload management and virtualization

Ignacio Martin Llorente llorente at dacya.ucm.es
Mon Nov 10 17:32:50 GMT 2008


Let me add my comments to this relevant discussion about architectural  
alternatives for integration of job and VM management.

As a general comment, if you are interested in using virtualization to  
provide jobs with a pre-defined execution environment, I fully agree  
that the best approach is a tight integration of job management and  
virtualization, So an end user can define the VM where to run his jobs  
by using for example a new attribute in the template. However, if you  
are interested in running several services (for example two computing  
clusters, a computing cluster and a web server,...) on the same  
cluster, I believe that you should separate the infrastructure  
management from the job management. Both layers have different aims  
and you can easily define infrastructure level heuristics for  
consolidation... that are independent from the job scheduling  
policies. The submission of VMs encapsulating SGE execution nodes can  
be done on-demand and of course with different pre-defined  

>>> there is currently a very interesting discussion ongoing concerning
>>> GridEngine and virtualization.
>>> I want to invite you to a little 'Thought Experiment'. From your  
>>> point
>>> of view, how do you think should an ideal integrated solution for
>>> workload management *and* management of virtual resources should  
>>> look
>>> like? Do you think this would be a good idea at all? Don't be shy  
>>> and
>>> feel free to come up with a long wish list, Christmas is coming ;-)
> The main question that interested me was whether organisations with  
> compute farms can use the same system (e.g. SGE) and resources to  
> manage a VM server cluster. Applications like OpenNebula and oVirt
> are designed to manage such clusters but they work independently of  
> the job scheduler and so I feel that the actual compute resources  
> cannot be fully maximised automatically.

Why? OpenNebula allocates resources to VMs considering the requested  
capacity by the VM. The only problem with the virtualization of  
computing nodes is the virtualization overhead. In our workloads it is  
about 10%.

> There are two main groups of VM; short execution environments in  
> which you run specific jobs (it can run an execd but as part of a  
> special queue) and long running "server" VMs which may be virtual  
> appliances or are execution hosts whose OS can dynamically change  
> depending on the pending requirements. Rebooting, reinstalling or  
> partitioning the farm up depending on OS is usually quite a manual,  
> slow inefficient operation.
> Some example cases I can think of -
> * Your compute farm mostly runs normal jobs (through SGE) but when  
> you underutilise the resources other organisations with their own  
> custom OS images (and even schedulers) can use up the free cputime.  
> Essentially you could rent your unused capacity in a similar way to  
> Amazon's EC2. They are billed by what they use. Things like Globus  
> spend much of their time ensuring that remote execution environments  
> are similar to yours - VMs are much easier IMHO.
> * Desktop machines can become part of the compute resources during  
> periods of idleness without having to guarantee that their OS image  
> is the same as that running on the permanent compute farm. These can  
> be migrated or suspended when the user returns.
> * Long running simulations can be periodically checkpointed (e.g. in  
> case of a power outage) without having to program in specialised  
> checkpointing interfaces. Third party commercial applications can  
> also be checkpointed easily.
> * Under extreme server cluster load the VMs can spill over onto the  
> compute farm if required (server VMs would be very high priority  
> jobs).
> * Rolling out a new OS image across the compute farm hardly ever  
> happens in one go as some departments have not yet migrated their  
> code over yet. So rollout the new VM capable image but automatically  
> spawn VMs of the older OS image when those departments need to run  
> their older software. Perhaps something like Hedeby can start and  
> stop these VM jobs when it detects that there are jobs which require  
> them. You could automatically reboot the machines between OS images  
> too but it seems somewhat more wasteful - what if you only need a  
> single 2 slot job to run on the old OS but your hosts are 8 slot  
> machines? It would be better to keep the other 6 slots available for  
> the newer OS jobs.

Your example cases are infrastructure related and not job management  
* Sharing of underutilize resources
* Fault tolerance (power outage)
* Dynamic partitioning of a farm

In fact they are some of the "typical" use cases that are used in the  
promotion of VM distributed managers (see Benefits section athttp://www.opennebula.org/doku.php?id=about) 
. Of course, you could modify SGE to be a VM distributed manager, but  
my position is that you have to modify many internals because the  
efficient management of VMs is different from efficient management of  
jobs. In other words, You can not have a technology being able to  
efficiently managed both jobs and VMs.
> Obviously the big thing missing from all of this is a nice GUI to  
> manage VMs like OpenNebula but if something like SGE can provide all  
> the required functionality then creating custom GUIs (like many  
> already do for things like "qstat" and "qalter") is fairly trivial.  
> Or perhaps, like Hedeby, it is something that Sun would be  
> interested in developing at some point.

OpenNebula does not provide a GUI, it is focused on the dynamic  
management of VMs on a distributed pool of resources. It has been  
designed to meet the fucntionality and scalaibitlty requirements of  
UNIX clusters. In fact the CLI is similar to that provided by job  
managers: submit VMs, monitor running VMs, monitor resources, control  

>> What we would like to have, is a checkpointing (& migration) facility
>> for long running applications - even for applications where only the
>> binaries are available.
> The checkpointing suspend_method in SGE should be easy enough to  
> configure to suspend and resume jobs/VMs but obviously there is  
> currently no inbuilt way of migrating jobs. As I mentioned in the  
> other email thread there are probably ways to devise such a feature  
> by launching a new job that once running sucks the VM from another  
> machine also causing the original job to quit.
>> Maybe it's not necessary to run a complete virtual machine for each
>> slot (and having one execd for the virtual machine and an additonal
>> one inside the virtual machine), but to emulate only some layer of a
>> virtual machine. The sge_shepherd becomes a sge_virtualizer with a
>> tighter integration of the outer and inner world. This would allow,
>> also to send e.g. signals from the outer machine to the program
>> running in the sge_virtualizer. As VirtualBox is not only open
>> source, but also now owned by SUN, maybe there are good options to
>> combine it.
> I think that the VM would, in most cases, be a "parallel" job using  
> a fair proportion of the actual CPUs - single slot/cpu VMs are  
> probably not the most efficient use of RAM for example. Saying that  
> until virtualised drivers (network/storage) can replicate the bare  
> metal native performance it may be better to have multiple VMs per  
> physical host. Running an execd inside the VM and using queue  
> configurations and job resource requests should be good enough to  
> manage jobs within the VM assuming you use a fully bridged network.  
> If you can't or don't want to run an execd within the VM then an  
> interface to ssh into the VM and run jobs may be useful. Perhaps you  
> could have an extra accounting dependency that knows that a hostname  
> is actually a VM subset of another hostname. Qstat could then be  
> made to report all jobs running on "host1" even if some of them are  
> running within a locally networked VM "vm-host1" - it gets  
> complicated quickly!
> VirtualBox is the Sun favorite but libvirt is becoming a pretty good  
> hypervisor interface standard on Linux which hopefully Sun will  
> support at some stage. I'm more of a KVM fan at the moment - Xen was  
> just too complicated to package and maintain.

We use libvirt for our driver to access KVM, and we also provide a new  
libvirt interface on top of OpenNebula (http://trac.opennebula.org/wiki/LibvirtOpenNebula 
), yes, on top of a distributed infrastructure.

BTW, we would like to know if somebody is interested in VirtualBox for  
the virtualization of computing clusters, because we are evaluating  
the development of drivers for this virtualization platform.


> Daire
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88353
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list