SINGLE-ROOT I/O VIRTUALIZATION Single-root I/O virtualization’s primary target is existing PCI hierarchies, where single-CPU and multi-CPU computers have the traditional single point of attachment to PCI (Fig. 1, again). One of the significant constraining goals of the single-root spec was to enable the use of existing or absolutely minimally changed root-complex (i.e., chipset) silicon. Likewise, enabling existing or minimally changed switch silicon was a constraint.
Given those requirements, there can still only be a single memory address space from the bus perspective. Partitioning and allocation for the virtualized SIs is performed at a level above the root-complex attachment point. Some type of address translation logic is generally presumed to exist in or above the root complex to enable a “virtualization intermediary” (commonly referred to as a hypervisor) to perform that mapping. New IOV endpoint devices will be required, of course, with their associated non-trivial design and support challenges.
The “don’t change the chipset!” philosophy opens the virtualization market to significant numbers of existing or simply derived systems (e.g., might need new BIOS or chip-set revision). However, it shifts a substantial burden to software performing the virtualization intermediary function.
MULTI-ROOT I/O VIRTUALIZATION The most obvious example implementation of the multiple attachment point hierarchy (Fig. 2, again) is a blade server with a PCI Express backplane, though the PCI Express Cable specification opens up a number of other possibilities. This is a new PCIe hierarchy construct—effectively a (mini) fabric.
Here, the PCI-SIG target was “small” systems with 16 to 32 root ports as likely maximums, though the architecture allows many more. (One of the workgroup’s sayings was “Our yardstick is a yardstick,” i.e., the typical implementation is expected to be a system occupying not more than about three feet cubed.)
Again, retaining the use of existing or absolutely minimally changed rootcomplex (i.e., chipset) silicon was a key goal. Unlike single root, however, no virtualization intermediary is assumed and the complexity of partitioning the system moves into a new enhanced type of PCI Express switch (Fig. 2, again), which is called “multi-root aware.”
The key difference in a multi-root system is the partitioning of the PCI hierarchy into multiple virtual hierarchies all sharing the same physical hierarchy. Where single-root systems are stuck with a single memory address space being partitioned among their SIs, multi-root systems actually have a full 64-bit memory
address space for each virtual hierarchy. Configuration management software, working in conjunction with the enhanced switch(es) and IOV devices, programs the hierarchy so each root complex from Figure 2 “sees” its portion of the entire multi-root hierarchy as if it were a singleroot hierarchy as in Figure 1. Each of those “views” of the hierarchy is called a virtual hierarchy. Each virtual hierarchy of a multi-root system can be independently enabled for single root or not. Therefore, endpoint devices in a multi-root system face the challenge of layering both modes.
Every SI should see its own virtualized copy of the configuration space and address map for a given device being virtualized. Effectively, the device needs “n” sets of PCI configuration space to support “n” of these virtual functions. The singleroot specification defines lightweight virtual function definitions to reduce the gate count impact, while the multi-root specification relies on a full configuration space per device usable virtual hierarchy.
The various “flavors” of configuration spaces are too detailed for this article, which is focused on virtualization at a high level. For the purposes of this discussion, it’s sufficient to note that every SI interacting with an IOV device will have its own device address range and configuration space. Thus, the IOV device can associate work with a particular SI based on which address space was accessed.
VIRTUALIZING THE STORAGE SIDE At this point in our hypothetical development process, an IOV device was enabled to respond as if it were multiple devices and provided with a mechanism to distinguish between two different SIs. If the implementation were stopped at this point, the model would look like Figure 3. Note that the depictions of SIs don’t attempt to distinguish whether they’re single-root or multi-root. At this point, there’s really only concern that they’re different images. The precise means of connection is unimportant.
Effectively, all SIs see all of the disks connected to the IOV storage controller. In some environments, this model might actually be okay. If the SIs were cooperative, they could divide up the pool of storage themselves. Likewise, if there were some software intermediary between each SI and the storage controller, it could divide up the pool of storage and allow an SI to see only a portion of the pool.
Considering the example at the beginning of this article, users could be uncomfortable with their banking system “cooperating” with the crew at www.hackers-are-we.org. While the software intermediary idea would be okay, it would eliminate a lot of the performance savings of doing IOV in hardware, and it would be a rather complex piece of software needing intimate knowledge of each controller’s hardware and device driver. Clearly, then, for most environments, hardware virtualization of the storage side is desirable.
SAS TO THE RESCUE Therefore, it’s not a difficult stretch to imagine that a creative IOV storage controller designer could add a straightforward table mechanism to filter out disk drives by their ID and only let certain SIs “see” certain disk drives. Such a system would look like Figure 4, where each colored SI has access to the same color disk drive(s).
Historically, this could have been done fairly easily in an SCSI environment— where SCSI even provided facilities for sub-dividing a single disk drive. Even a SATA controller today could probably handle this sort of per-disk drive “masking.”