Virtual Machines in the Data Center
Explore the role of virtual machines in data centers and how they influence distributed system performance. Learn about resource oversubscription, application design challenges, and clock synchronization problems to build resilient, efficient software.
We'll cover the following...
Virtualization
Virtualization promised developers a common hardware appearance across the bewildering array of physical configurations in the data center. It promised data center managers that it would rein in “server sprawl” and pack all those extra web servers running at 5 percent utilization into a high-density, high-utilization, easily managed whole. Guess which story turned out to be more compelling?
On the down side, performance is much less predictable. Many virtual machines can reside on the same physical hosts. It’s rare to see VMs move from one host to another, because it’s disruptive to the guest. (The “host operating system” is the one that really runs on hardware. It provides the virtualization features. “Guest operating systems” run in the virtual machines).
Physical hosts are usually oversubscribed. That means the physical host may have 16 cores, but the total number of cores allocated to VMs on the host is 32. That host would be 200 percent subscribed or 100 percent oversubscribed. If all those applications receive requests at the same time, just through random chance, then there’s not enough CPU to go around. Almost any resource on the host can be oversubscribed, especially CPU, RAM, and network. Regardless of resource, the result is always the same: contention among VMs and random slowdowns for all. It’s virtually impossible for the guest OS to monitor for this.
Designing applications for VM
When designing applications to run in virtual machines (meaning pretty much all applications today) we need to make sure that they’re not sensitive to the loss or slowdown of any one host. That’s just a good idea anyway, but it’s particularly important here. Here are some things to watch out for:
- Distributed programming techniques that require synchronous responses from the whole cluster for work to proceed.
- “Special” machines like cluster managers or lock managers, unless another machine can take over without reconfiguration.
- Subtle dependency on request or event ordering. Nobody designs this into a system, but it can creep in unexpectedly.
Virtual machine clock problem
Virtual machines make all the problems with clocks much worse. Most programmers carry a mental model of the clock as being monotonic and sequential. That is, a program that samples the system clock may get the same value twice but it’ll never get a value less than a prior response. It turns out that’s not even true for a clock on a physical machine. But on a virtual machine it can be much worse. Between two calls to examine the clock, the virtual machine can be suspended for an indefinite span of real time. It might even be migrated to a different physical host that has a clock skew relative to the original host. A clock on a virtual machine is not necessarily monotonic or sequential. The virtualization tools try to paper over this with a little communication from the VM to query the host so the VM can update its OS clock whenever it wakes up.
That keeps the VM’s OS clock synced with the host’s OS clock. From an application perspective, this makes the clock jump around even more. The bottom line is: don’t trust the OS clock. If external, human time is important, use an external source like a local NTP server.
In virtualization, which of the following is the least predictable?
Performance
Speed
Memory