|Home | About | Journals | Submit | Contact Us | Français|
The attractions of virtual computing are many: reduced costs, reduced resources and simplified maintenance. Any one of these would be compelling for a medical imaging professional attempting to support a complex practice on limited resources in an era of ever tightened reimbursement. In particular, the ability to run multiple operating systems optimized for different tasks (computational image processing on Linux versus office tasks on Microsoft operating systems) on a single physical machine is compelling. However, there are also potential drawbacks. High performance requirements need to be carefully considered if they are to be executed in an environment where the running software has to execute through multiple layers of device drivers before reaching the real disk or network interface. Our lab has attempted to gain insight into the impact of virtualization on performance by benchmarking the following metrics on both physical and virtual platforms: local memory and disk bandwidth, network bandwidth, and integer and floating point performance. The virtual performance metrics are compared to baseline performance on “bare metal.” The results are complex, and indeed somewhat surprising.
Virtual computing is a term that describes the concept of running one or more virtual computers (aka machines) on top of a single physical computer; the virtual machines (VMs) do not interface directly with any real hardware, but rather software mimics the real hardware that the virtual host provides [1, 2]. The attractions of virtual computing are many: reduced costs, reduced resources, and simplified maintenance. However, there are potential areas where virtual computers may not be advisable. High performance/speed requirements will have to be carefully considered if they are to be executed in an environment where the running software has to go through multiple layers of device drivers before reaching the real disk or network interface.
Why would the readers of this journal be interested in virtual computing? In the current economic environment, it can be challenging to obtain new physical resources. A department administrator may find it easy to deny a researcher a new physical server if the request competes with more clinically related funding requests. Perhaps an investigator has to choose between a desktop system that will be needed for office-related tasks (grant writing, reports, etc.) or a computer server on a different operating system to do the actual work. Alternatively, a given laboratory may face space and electric power constraints; in our case, the mission assigned to our lab includes maintaining test systems for change management of all our clinical viewing systems. This task alone translates to over 20 servers and does not address the research and development work we do. Our lab has neither the space nor power for 20 physical servers, but we did have the space for two 12 processor servers with 32 GB of memory each and separate redundant storage. However, the decision of how to use those resources is not at all obvious; certainly all VMs could be hosted on one platform, but will that one platform offer adequate performance for all the VMs?
To quantify the extent to which virtualization harms performance, it is useful to break down the constituents of performance. Basically a physical computer program in the midst of intense calculations makes use of at least local memory and processor resources; it may also use local disk and network resources. A VM has the same resources, except they are software “devices” that in turn may be layered on top of:
The figure makes this clearer; the first shows a classic physical computer with the OS residing directly on the physical hardware (i.e., “bare metal”), the second shows a thin hypervisor which in turn hosts the user OS’, and finally, shows a physical machine hosting a common OS which then hosts the hypervisor which then hosts the VM.
In this work, we endeavor to measure the following metrics across various combinations of virtual machine environments and host operating systems: local memory and disk bandwidth, network bandwidth, and integer and floating point performance.
The measurement hardware consisted of two identical Dell 690 workstations: 1 Gbit network interface, 8 GB of RAM, 15,000 RPM SCSI disks, and a 2.3-GHz quad-core processor (Dell Corporation, Round Rock TX). For the file server, we chose FreeNAS Version 7 (http://freenas.org), which is an optimized open source file server appliance based on 64-bit FreeBSD (http://freebsd.org). The two computers shared a private Gbit switch (Cisco Systems, San Jose CA). The computer in the client role ran various host operating system configurations as follows:
The virtualization products trialed included:
To standardize the measurement procedure, we built a suite of measurement tools on top of a minimalist instantiation of RedHat V5.5 32 bit. A 32-bit VM was chosen as the benchmark platform for portability, a 32-bit VM can run on either a 32- or 64-bit host OS while the converse is not true. This is important because a cost-sensitive user running a virtual environment on Microsoft tools may not be able to afford the additional charges that are incurred for that company’s 64-bit high performance products. Using this base “appliance,” we crafted a suite of tests that measures:
Figure 2 in the Appendix shows the script that automated all the tests and reported the results to a file. File Input/Output performance was measured using the “dd” command that is standard in Linux. Integer performance was measured using the Dhrystone-2 benchmark compiled with the following “sh dry.c” [3, 4]. Floating point performance was measured using the Whetstone benchmark compiled with the following switches “cc whets.c –o whets -02 –fomit-frame-pointer of –ffast-math –fforce-addr –lm -DUNIX” and activating the setting for double precision [5, 6]. More modern benchmarks exist; the reader may be familiar with “SPECInt,” the newer “SPEC CPU,” or other offerings from Standard Performance Evaluation Corporation (Warrenton, VA) . However, these tools are not free, while the source code for the older Dhrystone and Whetstone metrics is.
Having built the appliance, we installed it on a flash drive to measure “bare metal” performance; the client computer booted from the flash device and ran the test suite completely in system random access memory (RAM). The resulting performance figures represent the baseline performance possible in the “bare metal” configuration of an operating system running directly on top of the client computer physical hardware. We then reconfigured the client computer with various host operating systems, which in turn hosted various vended virtual computer environments. The base flash drive image was then used to create VMs in each of the virtual computing products. In all cases, the VM implementations consisted of:
The physical image was similar with the exception that the total system RAM was available to the 32 bit appliance kernel running on the native processor. The results are compiled in the next section.
As suggested by Fig. 1, the results can be broken out into three groups based on whether the test appliance was operated on bare metal, a thin hypervisor, or a thick hypervisor residing on a host OS. The accumulated results are tabulated in Table 1.
The following discussion summarizes key points. For example, read/write (R/W) performance in RAM, local disk, and network disk comprise three distinct areas, and the winner is not consistent.
The experimental outline pursued herein is aligned with the needs of our lab and the various customers we serve. It is often the case that the lab serves as an “incubator” for departmental projects, and those that prove themselves are promoted to clinical applications that move to the official hospital data center. Because the data center has standardized on VMWare ESXi, we have found it most efficacious to perform our base development in that arena. However, one can also see that VMWare is not often the performance winner. Fortunately, free tools from VMWare (i.e., Convertor Standalone Client) make it trivial to convert VMWare machines to Open Virtual Format which can be read by Virtual Box and Xen.
It is also a frequent requirement of our work to share our results with outside labs which are vey cost sensitive. For this reason, we chose to perform this analysis with products that may not be Free Open Source Software (FOSS), but are at least available without cost. Since we share the resulting VMs with third parties, it is also axiomatic that we must create them on platforms that are based on FOSS licenses; hence, the benchmark VM used here was based on Linux. Others could obviously replicate the current work on using a Windows VM benchmark platform; indeed, it would be interesting to see if the noted trends are reproduced.
One would expect, and indeed we certainly did, that the thin hypervisor group would be closest to bare metal results. However, the results are more complex than that, and as one can see from the preceding data, one can see that selecting the “best” VM environment depends on the target application’s behavior; is it compute limited, R/W limited, or a combination of both? It was also somewhat puzzling that sometimes the Write performance (be it on RAM, local disk, or network disk) was sometimes faster on a VM then on bare metal (note the performance of Virtual Box in this regard). In retrospect, however, this should not have been so surprising. In an OS on bare metal, the Write performance is totally gated by the input/output (I/O) performance of the real OS, whereas in a VM the VM memory manager may employ newer and more efficient buffering algorithms then the real OS can when writing to a slower physical I/O system. However, this cannot be done in the case of reads; the entire path to the physical layer has to be traversed and one notes in no case does VM read performance beat that of bare metal.
Another surprising result is the Integer and Floating point performance of the Virtual Box VM verses bare metal. One may expect that a virtual environment could largely expose the CPU directly to the VM client (without the overhead of virtual device drivers inherent in disk and other I/O operations), and thus that client could approach bare metal speeds. But it is difficult to comprehend how the VM could actually best the bare metal Dhrystone 2 results—clearly there is some very clever engineering in play in the Virtual Box.
One final observation is the relatively poor across the board performance of the VMWare ESXi server compared to the other thin platforms (Xen and KVM). This may be due to ignorance of tuning on our part; but as all platforms were used “out of the box,” we believe this experience would be observed by others. Another possible explanation is the difference in VM architecture. Both Xen and KVM rely on and use dedicated features in both the physical CPU and the guest OS being virtualized. This is called “para-virtualization” meaning that the VM environment performs some, but not all of the work, some of it is relegated to the physical CPU [8–12]. Obviously hardware runs faster than software, but the downside is that only newer hardware and modified OS’ can be used. On the other hand, the full virtualization approach used by ESXi can run older hardware and support an unmodified OS (i.e., Windows NT and 2000), but apparently at a performance cost.
Based on the preceding one can deduce the following recommendations:
For various reasons we have found it very productive to adopt virtualization in our practice, but this direction is not without its drawbacks. In particular, read performance on local and network disk is negatively impacted as is floating point performance. Applications that are very sensitive to these requirements may not provide satisfactory performance in a network environment. Also, in contrast to expectations the best performance was often seen from a thick virtualization tool (Virtual Box) rather than the thin hypervisor environment.