A Brief Discussion of Security with Regard to Resource Over-Commitment in VMware

There are two schools of thought when it comes to physical memory over-commitment between virtual machines.

The first school of thought is that it is a great way for virtual machines to leverage more memory than the host server actually has. the memory resources available to the Guest OS machines exceed the available resources of the host. So:

Host Server 64 GB RAM

10x VMs 2GB Reservation, 8GB Limit

Memory reserved for powered-on VMs: 16GB RAM

Memory available to each guest OS: 80GB RAM

Obviously our virtual machines cannot access what is not there, but most machines do not use all available resources at any given time; so each VM has 2GB permanently (as long as they are powered-on), and there are 44GB left for the VMs to ‘share’. This is called resource over-commitment, and is enabled by what VMware calls their balloon driver which, I must admit, is pretty cool. Because our guest operating systems would crash if the actual available memory constantly changed, a swap file is created on the data store that is equal to the total available memory minus the memory reservation, and when the VM does not have the physical memory available the swap file stands in its place for all or any part of the memory requirements.

(I should mention that I have severely oversimplified this scenario for the sake of simplicity. I am not including factors such as host resource requirements, priorities, and more; they are irrelevant to the point of the article.)

The second school of thought is that memory over-commitment (which obviously implies physical memory being shared or ‘traded’ between virtual machines) is a great and blaring security hole. For this reason Microsoft’s Hyper-V (including the original and the 2008 R2) do not support over-commitment. So:

Host Server 64 GB RAM

10x VMs Maximum 6.4 GB RAM each

In Hyper-V all allocated memory is protected from the others by virtual buses.

In VMware many workloads present opportunities for sharing memory across virtual machines. For example, several virtual machines may be running instances of the same guest OS, have the same applications or components loaded, or contain common data.

According to one Microsoft virtualization security expert, Microsoft’s position is that by sharing resources there is a potential that hackers could inject code into a driver or common application that would be used by multiple VMs, thus passing the malicious code from the [initially infected] virtual machine into others.

The expert goes on to say that this is all theoretical to this point, because to date there have been no known instances of hackers exploiting this hole in the wild.

The next layer to this issue is that there are applications that allow you to patch VMware guest machines ‘on the fly’ in memory. In other words a hacker who breaches the initial security now has a tool to inject malicious code into running VMs.

I have always said that the level of security of any system should take into account all reasonable threats, with a strong consideration for what the security system is protecting. In other words while both need a firewall, the solution I implement for my mother’s laptop will look nothing like the solution I implement for an enterprise client with sensitive data.

I think that both Microsoft’s Hyper-V and VMware’s Virtual Infrastructure are excellent virtualization solutions. While you can’t beat the price of Hyper-V, I would never tell a client that they should not implement an ESX 4.0 Server because of a hypothetical potential security flaw inherent in over-committing resources.

I will continue to keep my eyes open for this exploit. Ralph Waldo Emerson said that ‘if you build a better mousetrap the world will beat a path to your door*;’ I do not believe that, and if one were to look at IT security as a baseline the phrase would be ‘Build a better mouse trap, and the world will make a better mouse.’ One of the unfortunate results of improvements in systems security over the years has been how much smarter hackers have become, and I suspect it is only a matter of time before this vulnerability is exploited.

ADDITION

Although memory over-commitment is a great way of maximizing and even extending past your actual available resources, it should be mentioned that even VMware does not recommend that it be used in a production environment. According to a document on their website entitled ‘Performance Tuning Best Practices for ESX Server 3’ (I have not been able to find a similar document for ESX Server 4, but this technology is similar):

Avoid frequent memory reclamation. Make sure the host has more physical memory than the total amount of memory that will be used by ESX plus the sum of the working set sizes that will be used by all the virtual machines running at any one time. (Note: ESX does, however, allow some memory overcommitment without impacting performance by using the memory management mechanisms described in “Resource Management Best Practices” on page 12 [of this document].

One colleague of mine, an employee of Microsoft, concedes that resource overcommitment is a great tool for a test/dev environment, but is adamant that he would not use it in production. I would not disagree with this. However like so many questions in our field the real answer is what I refer to as the Universal Consultants Answer (UCA): It depends.

–

*This phrase is apparently a misquote; the true quote is ‘If a man has good corn or wood, or boards, or pigs, to sell, or can make better chairs or knives, crucibles or church organs, than anybody else, you will find a broad hard-beaten path to his house, though it be in the woods’

Rate this:

One response to “A Brief Discussion of Security with Regard to Resource Over-Commitment in VMware”

VMwareDave

January 4, 2011 at 4:50 pm

Your article confused me at first as I thought it was about over-commit, instead it is only about Shared Memory which is only one component of memory overcommit. Prior to R2 SP1 (in RC release at the time of this posting), Microsoft did not support any memory overcommit, with the release of SP1 they are now supporting memory overcommit, but I do not believe they are including shared memory in that support.

The flaw in Microsoft’s argument about shared memory is the way that VMware implements Copy-on-write for all shared memory. Any writes to blocks in the shared memory pool is copied to unique memory addresses. VMware makes no effort to identify if it is identical to existing memory blocks at the time of write in order to save processing time. A low priority background process runs to identify identical blocks. This process will eventually identify if the written block is identical and add it back into the shared memory pool. In this manner, there is no way that malicious code can be injected into one VM and other VMs gain access to it. Such malicious code would have to be injected into each VM individually before the blocks where it resides could be identified as identical.

The World According to Mitch