USB 3.0 offers approximately 10:1 performance improvement over USB 2.0, but realizing the full potential of that gain for mass storage devices poses additional challenges. The mainstay of mass storage standards for USB 2.0 and 1.1 has been the Bulk Only Transport (BOT) device class specification for USB, and it has served the personal computer industry well with its simplicity and modest efficiency. It has proven to be simple and inexpensive to implement in microprocessor-based systems, including personal computers, and hence suitable for inexpensive flash-based mass storage devices.

In Bulk Only Transport, standard USB Bulk transactions are used for each stage of a mass storage transfer - command, data, and status. In fact, early versions of the BOT standard referred to it as "BBB," short for Bulk/Bulk/Bulk, which was an enhancement over the older Control/Bulk/Interrupt (CBI) standard that was used (and still is) mainly for full-speed USB floppy disk drives.

Higher performance mass storage devices tend to use the SCSI standards (Small Computer System Interface), which allow devices to support performance-enhancing modes of operation such as multiple outstanding requests and out-of-order request completions. Multiple outstanding requests mean that the system doesn't need to wait for each request to be completed before the next request can be sent to the device. The device simply stores the requests in its own buffer, sends a completion back to the host as each request is completed, and starts working on the next request in the buffer as soon as the previous one is completed. This allows the device to be more fully utilized if the USB link can keep up with the additional bus traffic, as USB 3.0 finally can.

But the original BOT standard had no provision for supporting multiple outstanding requests. A new USB standard was needed, and with USB 3.0 there has been a new industry initiative to provide an improved mass storage class specification. The new class standard is known as UAS (USB Attached SCSI) and UASP (UAS Protocol). The UAS standard is developed and maintained by the T10 Technical Committee in the ANSI organization (American National Standards Institute), under the auspices of the INCITS (International Committee on Information Technology Standards) within ANSI. The T10 Committee also maintains the SCSI standards. The UASP standard is developed and maintained by the USB Implementers Forum (USB-IF). The UAS and UASP standards work closely together, and most of the key companies in the industry are active members in both the T10 and USB-IF standards organizations.

A further performance benefit supported by SCSI is out-of-order processing of requests. This allows the mass storage device to schedule request execution in the most optimum way relative to the current state and workload of the device. For example, if the device uses a rotating magnetic or optical platter, it may be more efficient for the device to process requests involving accesses to the same track or nearby tracks first, to limit the number of times the head needs to move between tracks, as well as the distance that the head needs to travel each time. A rotating-platter device may also take into account the location of data on a track, and schedule requests in the order that takes best advantage of how far the platter needs to rotate before the requested sector reaches the head.

Multiple outstanding requests can be accommodated in USB 2.0 as well as USB 3.0, and the UASP standard has been designed to be usable in this way for USB 2.0 as well as USB 3.0. Out-of-order processing has been defined in UAS and UASP to be usable for both USB 2.0 and USB 3.0 also. USB protocol enhancements in USB 3.0 compared to USB 2.0 allow USB3 .0 to implement the UAS and UASP standards even more efficiently, beyond the raw data rate improvement in USB 3.0.

The main USB protocol improvements in USB 3.0 include IN-ACK packets instead of separate ACK and IN packets, and ERDY packets. In addition, the concept of "streams" was implemented in USB 3.0 and in the new xHCI standard (eXtensible Host Controller Interface), allowing a request and its subsequent completion to be grouped together according to their "stream" number. In USB 2.0, implementations of UAS and UASP achieve the association between commands and their completions by using the "Read Ready IU" and "Write Ready IU" ("IU" means Information Unit) within a bulk transfer, rather than being able to specify stream numbers directly in the packet headers.

A "stream" in USB 3.0 is an abstraction representing a subdivision of a "pipe." A "pipe," in turn, is an abstraction denoting a communication pathway between the host and a particular endpoint in a device. The "pipe" abstraction is used in USB 2.0 as well as USB 3.0, but "streams" are new in USB 3.0.

Performance Estimates and Results

USB 3.0 in Super Speed mode offers a maximum potential data throughput of 500 MB/sec after allowing for 8b/10b encoding, but various bus overheads detract from this maximum in actual operation. It has been estimated (as explained in the UASP standard) that an actual sustained data throughput of 400 MB/sec may be attainable using UASP, compared to an estimated limit of only 250 MB/sec for BOT operation on a USB 3.0 link without UASP.

Actual test results using the Renesas UASP driver and Renesas µPD720230 USB3-to-SATA3 bridge device are shown in Figure 1. The testing was performed using the CrystalDiskMark3 benchmark program, running under Windows 7 (64-bit) on a PC having a PCIe bus running at 5.0 GHz, with a Renesas µPD720201 USB 3.0 Host Controller and Renesas USB 3.0 (xHCI) driver for the host controller. The mass storage device was a Crucial RealSSD solid-state drive with a Renesas µPD720230 USB3-to-SATA3 bridge. In this test, both sequential read and random read with 512KB block size performed significantly better than 250 MB/sec in BOT mode, and up to 369.2 MB/sec with UASP enabled. Although the UASP results didn't quite achieve 400 MB/sec, UASP still provided up to 30% performance improvement over BOT. Gains for writes to the drive were more modest: 2.7% for sequential writes, and 1.9% for 512K random writes.

Test results can vary widely, however, depending on the specific benchmark program used, its configuration, and the capabilities and configuration of the various hardware elements beyond the SATA bridge and USB3 link.

73593_fig1
Figure 1: Benchmark Software test results (MB/s) shows how UASP improves both sequential and random access to USB storage devices.

Software Elements

Figure 2 shows how the Renesas UASP driver works together with the Renesas USB 3.0 Host Controller driver and other software elements in the complete software solution for supporting USB devices. The top-level application program interfaces with either the standard Mass Storage Class (MSC) driver in Windows, which operates in BOT mode, or with the new UASP driver, which bypasses the MSC driver and interfaces directly with the Renesas USB 3.0 Host Controller Drivers (Root Hub and xHCI). The UASP driver also interacts with the MSC driver in various ways as needed.

73593_fig2

Figure 2: Renesas’ UASP, Hub and xHCI software stack used with Microsoft Windows XP, Vista and Windows 7.

It should be noted that there is no public standard at present for the interface between the UASP driver and the USB 3.0 xHCI driver. The Renesas UASP driver is designed specifically to operate with the Renesas xHCI driver and other compatible xHCI drivers.

UASP Enables Multiple Command Queuing and Out-of-Order Processing

Figure 3 illustrates the performance benefits of Multiple Command Queuing in mass storage devices. In BOT mode without command queuing, the device has to finish each command before it can accept a new command. It has to wait for a new command to arrive, then process it and send a completion, and wait for the completion to be acknowledged. There is considerable waiting time associated with each command.

73593_fig3

Figure 3: UASP benefits from USB 3.0’s new dual-simplex and stream-transfer features.

UASP's Command Queuing capability reduces all the waiting delays by sending a new command to the device while the device may still be processing a previous command. The device, in turn, can begin processing the next command whenever it is ready to do so, without waiting for a new command to arrive, if a command has already been queued up in the device. Command queuing can also benefit USB 2.0 mass storage devices as well as USB 3.0, if the devices are designed to support that feature. The USB 2.0 link itself already has the ability, through the "pipe" concept in USB 2.0, to support command queuing.

Figure 4 depicts a mass storage drive that uses rotating media. If the drive is designed to support command queuing, a further performance gain can be achieved if the drive also has the ability to optimize the order in which the enqueued commands are processed. This capability is referred to as Native Command Queuing (NCQ). The key word is "Native," meaning performed by the drive. Without NCQ, as shown in the platter on the left, the commands might be issued and processed in 1-2-3-4 order. Depending on where each target sector is on the platter, the drive head might need to move to a different track for each command, and need to move across many tracks to reach the correct target track. It takes considerable time to move the head on a drive. Anything that can be done to reduce the number of head movements or the distance of each move (number of tracks) can provide a substantial performance improvement.

73593_fig4

Figure 4: What is Native Command Queuing (NCQ)? Hard-Drive optimizes order in which read and write commands are executed.

UASP can reduce head movement by allowing the drive to access more than one sector on the same track in response to enqueued commands. The platter on the right in Figure 4 illustrates a 3-2-4-1 processing order, allowing the head to move only a short distance to the next adjacent track in response to each enqueued command. If two target sectors happen to reside on the same track, the drive may also decide to access more than one sector on the track before moving the head again.

Note, in addition, that reducing head movement also reduces wear on the drive, which helps to increase the drive's useful operating life in drives that use rotating media.

To support NCQ most efficiently, an enhancement was added in the USB 3.0 specification to implement the concept of "streams" within a "pipe." USB 2.0 already implements the concept of a "pipe" for each "endpoint," but "streams" in USB 3.0 allow a further subdivision of each "pipe" into multiple "streams." Then, the host and drive can tag each command and completion with the stream number as well as the pipe (endpoint) number, to allow the completions to be correlated to the commands.

The UAS "tag" field in the Information Units ("IU") within Bulk transfers in USB 2.0 also allow command tagging and NCQ support in USB 2.0, but it is not as efficient in bus utilization as the "stream" feature in USB 3.0.

Additional Details of UAS and UASP

As noted earlier, the impetus for UAS and UASP originates in the SCSI Standards (Small Computer System Interface), specifically the "SAM-4" standard. ("SAM" stands for SCSI Architecture Model). The UAS standard defines how SAM-4 can be implemented on a USB link, either USB 2.0 High Speed or USB 3.0 Super Speed. UASP defines additional details of the USB implementation of UAS.

UAS defines the abstract concept of an "I_T Nexus," where "I" means UAS Initiator and "T" means UAS Target. For transferring requests, data, and status between the Initiator and the Target, the "Nexus" includes the "Default" USB pipe (utilizing USB Control Transfers) and four "Bulk" USB pipes:

  • A Bulk Out pipe for UAS Commands from the Initiator to the Target
  • A Bulk In pipe for UAS Status responses (command completions) from Target to Initiator
  • A Bulk Out pipe for UAS write data (from Initiator to Target)
  • A Bulk In pipe for UAS read data (from Target to Initiator)

UAS Commands, Data and Status are transported using Information Units ("IU") of various types specified in the UAS standard. Every IU includes a 4-byte header that specifies the type of IU, a "reserved" byte, and the two-byte Tag that correlates IU's to the originating Command.

"Pipe" is entirely an abstraction. On the actual USB link, the fields in the various types of USB packets specify the Device Address, the Endpoint number, and the Endpoint Direction bit, which together represent a "pipe." IU transfer in a pipe is performed by USB packet sequences such as IN-DATA-ACK and OUT-DATA-ACK for USB 2.0. In USB 3.0, ERDY, NRDY, and INACK packets replace the separate ACK or NAK used in USB 2.0.

At the UAS level, a complete data transfer consists of the following sequence:

  • Command IU from initiator to target
  • Read (or Write) Ready IU from target to initiator
  • One or more data transfers (read or write)
  • Status (Sense IU) from target to initiator

All of these IU's and data blocks are transported within USB Bulk In or Bulk Out packet sequences on the actual USB link. Also, on a USB 3.0 link the Read (or Write) Ready IU is replaced by ERDY for greater bus utilization efficiency compared to USB 2.0. Multiple transfers in the same pipe (same initiator and target) can be intermixed together, as described in the UAS specification. The correlation of each IU to the originating Command IU is defined by each IU's Tag field (for USB 2.0), or the Stream ID (for USB 3.0).

In effect, there is an overall hierarchy associated with UAS and UASP, consisting of SCSI and SAM-4 at the top level, then the T10 UAS standard, supported by the USB UASP standard and driver, which interfaces with a USB Host Controller driver (EHCI for USB 2.0 High Speed, xHCI for USB 3.0 Super Speed), and finally the hardware, i.e., the actual USB Host Controller and the USB mass storage device. In the hardware, there can also be up to five levels of USB hubs between the Host Controller and the mass storage device.

Referring again to Figure 3, the actual details of the various transfers summarized by the arrows involve UAS requests for status as well as UAS commands, which the target mass storage device fulfills after the requested data transfer has been completed. There will also be many ACK or ERDY packets in the communication, not explicitly shown in the figure. The blocks labeled "Cmd" actually denote an OUT-DATA-ACK packet sequence, with the UAS Command IU itself contained in the DATA packet. Similarly, the blocks labeled "Sta" (Status) actually denote an IN-DATA-ACK packet sequence with the UAS Status IU contained in the DATA packet. The UAS and UASP standards provide detailed protocol exchange diagrams showing all the various transfers at both the hardware and UAS/UASP software levels.

At the highest hierarchical level, all of the complexity is well hidden from the typical end user, and everything "just works" (if all the standards are properly followed).