What happened to the OS?
As I listened to some rumblings on some podcast on the soon-to-be-released Microsoft virtualization product, a thought came to my mind- what happened to the operating system?
This isn’t the first time that this thought has popped into my head, or something related to it. When I was first exposed to virtualization technology, I had the same question. Isn’t it the job of the operating system to manage the underlying physical computing resources effectively? Of course, I also realized how much I ignored the purpose of the operating system in work work as a Java developer and designer. I had become so embedded in the world of Java application servers that I had a difficult time getting my head back on straight when trying to guide some .NET server side development. I kept trying to find something to equate to the Java Application Server that sat on top of Windows Server, when in fact, Windows Server is the application server. After all, operating systems manage the allocation of resources to system processes, which are applications.
Now this isn’t a post to knock virtualization technology. There are many benefits that it can provide, such as the ability to move applications around independent of the underlying physical infrastructure (i.e. it’s not bound to one physical server). But, if we look at the primary benefit being touted, it’s better resource utilization. Before we jump to virtualization, shouldn’t we understand why the existing operating systems can’t do their job to begin with? If our current processes for allocating applications to operating systems is fundamentally flawed, is virtualization technology merely a band-aid on the problem?
Based on my somewhat limited understanding of VMs, you can typically choose between dedicating resources to a VM Guest, allowing the VM Guest to pull from a shared pool, or some combination of the two. Obviously, if we dedicate resources completely, we’re really not much better off than physical servers from a resource utilization standpoint (although I agree there are other benefits outside of resource management). To gain the potential for improved resource utilization, we need to allow the VM Host to reclaim resources that aren’t being used and give them to other processes. At this extreme, we run the risk of thrashing as the VM Guests battle for resources. The theory is that the VM Guests will need resources at different times, so the theoretical thrashing that could occur, won’t. So, we probably take some hybrid approach. Unfortunately, we still now have the risk of wasting resources in the dedicated portion.
The real source of the problem, in my opinion, is that we do a lousy of understanding the resource demands of our solutions. We use conservative ballpark estimates, choose some standard configuration and number of app servers, do a capacity test and send it out into the wild. When it’s in the wild, we don’t collect metrics to see if the real usage (which is probably measured in site visits, not in resources consumed) matches the expected usage, and even if it comes in less, we certainly won’t scale back because we now have “room to grow”. If we don’t start doing this, we’re still going to have less than optimal resource utilization, whether we use VMs or not. I don’t believe that going to a 100% shared model is the answer either unless the systems get much more intelligent and take past trends into account in deciding whether to take resources away from a given VM guest or not.
Again, this post isn’t a knock on virtualization. One area that I hope virtualization, or more specifically, the hypervisor, will address is the bloat of the OS. Part of the resources go to the operation of the OS itself, and one can argue that there’s a lot of things that we don’t need. While we can try to configure the OS to turn these things off, effectively a black-listing approach (everything is on unless those things appear on the black list), I like the white-list approach. Start with the minimal set of capabilities needed (the hypervisor) and then turn other things on if you need them. I expect we’ll see more things like BEA’s WebLogic Virtual Edition that just cut the bloated OS out of the picture. But, as I’ve said, it only gets us so far if we don’t do a better job of understanding how our solutions consume resources from the beginning.