What's The Difference Between ONFI 2 And ONFI 3
The newest Open NAND Flash Interface (ONFI) 3 standard offers many benefits for high-performance NAND Flash applications. This article describes many of the improvements from prior ONFI releases, and outlines the design requirements necessary to obtain the higher performance capabilities. ONFI 3 offers these key improvements for systems design:
- Performance of 400M transfers/s (transfers/s)
- On-die termination (ODT)
- Reduced signal level (1.8 V)
At 400M transfers/s, ONFI 3 runs at twice the performance of the previous ONFI 2 specification. ONFI 3 also offers useful features such as on-die termination, reduced signaling voltage, warm-up cycles, and volume addressing. This article reviews the ONFI 3 and related Toggle 2 specifications, and shows how they benefit system and controller designers while providing higher levels of performance and reducing costs.
What market segments will utilize this latest release? One target market is larger array applications due to the increase in the number of parallel Flash channels (a controller is a channel), and the high number of NAND Flash dies. Low-power applications will also be a candidate, due to the lower voltage, on-die termination, and higher performance provided by ONFI 3. A review of how ONFI and NAND have progressed is important to understand why a migration is necessary to continue the growth of NAND in current and new applications.
A History of ONFI and Toggle Specifications
The beginning of NAND Flash was very trying for manufacturers of controllers and host device solutions. Interface timing, page ID location, and format typically were vendor-dependent. This made it very difficult for configuration and booting. Also, command sequences were specific to the device, making development of state machines more difficult. Many solutions at the time were software-based due to the changes among vendors. Standardization was clearly needed, and the first industry-accepted standard was ONFI 1.0.
ONFI 1.0
The initial ONFI 1.0 specification standardized the device information location, format, and interface timing, as well as the command set. This made controller and software design much easier, and enabled DMA operation and special command sequencing. Commands for multi-plane, cache, and other functions were easier to pipeline. This allowed for faster integration, fewer software modifications, and easier device qualification.
ONFI 2.0
ONFI 2.0 established the first double data rate (DDR) interface for NAND devices. The asynchronous ONFI 1.0 interface was limited to 50MBps but typically achieved no more than 43MBps, whereas the DDR interface could obtain 133MBps. This provided a very cost-effective solution given the technology trends of the industry for page size and larger data transfers as well as improvements for random performance. The ONFI 2.0 specification was released in February of 2008.
Toggle 1
Samsung initially introduced this related specification, and then later Toshiba adopted the Toggle interface clock-less DDR interface. Unlike the ONFI standard, which followed the DRAM interface specification, Toggle 1 used a bi-directional DQS for all data transfers and used read and write commands to determine direction. In addition, both suppliers also agreed to the commands and timing. The two interfaces (ONFI and Toggle) are working toward a common standard, but this work is still in committee.
ONFI 2.1
ONFI 2.1 added additional features, most importantly performance up to 200M transfers/s. It also brought other enhancements for performance and error correction coding (ECC). One addition was logical unit number (LUN) addressing. This made it possible to reduce package pins by not requiring individual chip enable pins. ONFI 2.1 also improved performance over polling mode, and offered more flexibility for maximizing performance through interleaving. This also was the first NAND Flash interface to be used in a shared environment with mobile devices. The ONFI 2.1 (DDR2NVM) specification was ratified in January of 2009.
ONFI 2.2
ONFI 2.2, the most widely used specification today, was ratified in October of 2009. ONFI 2.2 provided several useful new features:
- Individual LUN reset
- Enhanced program page register clear
- New Icc specs and measurement
ONFI 2.3a
This specification was introduced mostly to define the EZ NAND device, which includes control logic with NAND to perform NAND error correction. The specification also introduced volume addressing.
ONFI 3
The latest ONFI specification improves functionality for large NAND array applications. The most important improvement is the bandwidth increase to 400M transfers/s, which aligns nicely with the 8K page-size increase commonplace in NAND Flash. Other new features included on-die termination, additional capabilities for multi-die packaging, and enhancements for volume addressing. The 1.8volt-only interface facilitates the selection of I/Os and processes for the controllers. Overall, ONFI 3 makes it easier for the designer to achieve a successful implementation. The specification was released in March 2011.
Toggle 2
Following the trend for larger page sizes and high-performance interfaces, the Toggle interface also improved its performance to meet the 400M transfers/s target. New features such as on-die termination are also supported in Toggle 2. Not all Toggle 2 data sheets are currently available, so check with your supplier.
Both interfaces are supported by the same Controller and PHY solution (Fig. 1) because they have similar interfaces (Table 1).
Feature | SDR | Toggle 1 | Toggle 2 | NV-DDR (ONFI 2) | NV-DDR2 (ONFI 3) |
---|---|---|---|---|---|
Data edge | SDR | DDR | DDR | DDR | DDR |
Maximum transfer (M transfers/s) | 40 | 133 | 400 | 200 | 400 |
ONFI 3 Benefits
Why are faster bus interface speeds necessary? Several trends call for faster bus speeds:
- Page size increases
- Longer write cycle times (Tprog)
- Increases in die per channel
Page size migration | 1,2,4,8K soon 16K page size |
Write cycles (MLC typ.) | 800, 1200,1500 µsec |
Program erase cycles | 1500, 2200,3500 µsec |
Not all of these trends affect every application. Some applications may not see a huge number of writes compared to a cache application, where writes are the most limiting parameter. For embedded applications where booting and code are the dominating attributes, error correction coding (ECC) and known-good boot blocks are the most important features. ECC has been increasing at each process node shrink (Fig. 2). Below is a generalized chart to show what the expected ECC requirements will be at each node. (Check with your supplier for more accurate requirements.) Your particular system may require more or less depending on your application, design, and firmware.
To improve performance, additional NAND devices are required, either on a per-channel basis or through parallel channels (parallel Flash controllers). This permits multiple accesses for a single channel where interleaving between devices is a normal mode of operation. This requires the controller to utilize command pipelining and improve bus efficiency. As a rule of thumb, if the page widths double to maintain data transfer rates/times, then the bus transfer rate should double as well. Figure 3 shows array and bus transfer times.
Moving to ONFI 3 or Toggle 2 can provide additional value for system design. A key criterion for storage is cost, which is influenced by several factors such as the number of channels (Flash controllers) and the quantity of NAND devices. A solid-state drive (SSD)-type application using 8 channels may be able to decrease to 6 channels, saving 16 NAND dies, 20 to 30 pins per channel, and associated power.
Reducing pin count
Pins add cost, not only for packaging and for silicon, but also because they require additional power. As such, decreasing the pin count for controller designs while reducing the layout area on the PCB will reduce cost and enable other system improvements. The addition of differential signaling to enable 400 M transfers/s is not desirable for overall pin reduction. However, for some designs it does improve noise margins.
To offset the additional pins, the ONFI 3 specification added a chip enable (CE_n) pin reduction scheme. Several CE_n pins can be reduced to a single shared CE_n pin. Enabling CE_n reduction requires a volume address assignment at initialization.
Two new NAND device-only pins are added to each NAND package. One pin is an input (En_In) and the other an output only (En_Out). These pins are connected to configure the NAND packages in a daisy chain. At initialization, only the first package in the daisy chain will accept a volume address assignment. After the first package has an assigned volume address, the En_Out pin will be pulled high, allowing the second package to accept volume assignment. This sequence will continue until all packages that share the same CE_n pin have unique volume addresses. At that point, the host controller has the ability to address packages, whereas with prior architectures, a separate CE_n was required (Fig. 4).
The ONFI 3 NV-DDR2 interface provides the features required to achieve 400 M transfers/s on the NAND interface. While Toggle 2 can serve as a compatible alternative, the ONFI 3 NV-DDR2 interface offers additional features that add flexibility and reduce cost. Work is ongoing in an ONFI-JEDEC Joint Task Group to achieve a common 400 M transfers/s interface.
On-die Termination Techniques
One feature available from both ONFI 3 and Toggle 2 is on-die termination (ODT). This improves noise margins, reduces signal reflections, affects slew rates, and improves the interface signal integrity. Not all implementations will work in the same manner; the information contained here targets the ONFI 3 implementation. When ODT is enabled, the device or host can dynamically switch termination circuitry “on” to avoid unwanted power consumption.
ODT uses several combinations of resistors. The internal ODT for the ONFI 3 specification can provide RTT values of 30 ohms (optional), 50 ohms, 75 ohms, 100 ohms, and 150 ohms. The on-die terminator also reduces the impact of external resistors/packs. For large array applications, ODT also works very effectively with double-buffered devices to reduce the individual die capacitive load, permitting increased die package stacking and reducing the external bus length and associated electrical parasitics. When an ODT control circuit is used, the Flash controller/PHY manages the ODT resistance through registers. The ability to dynamically change so many parameters to obtain optimized system performance with minimal power for storage systems offers tremendous benefits for both mobile and corded applications.
I/O Selection
The wide range of options available from device suppliers makes it quite difficult to develop a controller that can utilize multiple device types and multiple source suppliers. The best approach is to use a combo I/O pad. Table 3 below uses simulation data to show how well the combo I/O pad will work with the ODT used in ONFI 3 devices. The I/O supports DDR1/2/3 receivers, HSTL15/HSTL18/SSTL18/LPDDR1/LPDDR2 drivers, and a PVT-compensated pad. This will allow the use of the pad as an external memory interface as well as the NAND Flash interface.
DRIVER | Ohms | RECEIVER | Ohms |
---|---|---|---|
HSTL 1.5 | 39 | DDR 3 | 100 |
HSTL1.8 | 22 | DDR 3 | 120 |
SSTL1.8 | 34 | DDR 2 | 150 |
SSTL 1.8 | 19 | DDR 2 | 50 |
SSTL 2.5 | 39 | DDR 2 | 60 |
SSTL 2.5 | 21.5 | DDR 2 | 75 |
SSTL 1.5 | 34 |
These values correspond well with the ONFI 3 specification for Ron values covered under section 4.1 of the specification.
Good Design Considerations for ONFI 3
With a block page addressing scheme, NAND Flash makes boundary alignments difficult, making it harder to maintain performance for all applications. The general NAND device trend toward larger page sizes complicates matters even more for the controller and firmware. As long as full-page transfers are adhered to, the transfer efficiency is excellent. However, most applications today rely on partial page reads to minimize the transfer overhead. New applications are pushing the envelope of controller designs, file systems, and NAND array accesses.
The industry is responding with faster and higher density solutions to reduce per-bit costs and continue growth of NAND Flash. The escalating recommended minimum ECC threshold, and the bandwidth increases from the new interfaces, will typically become limiting factors in designs as well as providing the largest gate count component. Many error correction solutions do not adapt well due to increases in the number of finders and re-read operations, and might not be scalable for the increased packet rate.
The capabilities available from the NAND interface have improved tremendously. The chart (Fig. 5) uses a base 16Gb 8K page size and normalized MLC parameters to demonstrate the transfer efficiency for asynchronous, Toggle 1, ONFI 2, and ONFI 3 interfaces. The chart assumes a single controller and 8 devices (separate or stacked die) with 10% margin for bus transfer, and a small amount of software overhead and no reclaim. The performance shown below clearly shows the impact of the interface. Unlike DRAM, which is still able to achieve array performance gains, NAND is actually not improving and in high write applications, the time to complete a write cycle is much worse.
This comparison shows why parallel controllers are necessary to achieve high performance in applications such as SSDs and caching. Improving the performance on a per-channel basis can reduce the number of channels required and reduce the SoC cost and power consumption.
Interleaving between Flash devices requires a command pipeline that can keep up with the demands of the channel. Coupled with an efficient bus turnaround time, this is extremely important for maximizing efficiency. As mentioned earlier, we used a 10% overhead for the data, but in the actual system the number needs to be closer to 5%. The process by which you create the command pipeline needs to have a low latency, and the commands need to reside close to the bus to reduce latency and help achieve a balance across the channels. This is one of the key areas where the increased speed of 400M transfers/s improves the efficiency for commands between devices.
The ability to add more devices per channel for performance and redundancy is supported with new package options and double buffered I/Os. This reduces the capacitive load, permitting the number of devices to increase up to 16 per channel, making the bus efficiency and command pipeline even more important. For non-boundary–aligned accesses (particle page), hardware accelerators are extremely valuable and will reduce the latency and improve CPU availability while maximizing random performance.
The physical layer (PHY) performance and I/O selection is a critical component for all market applications. Most designers are aware of the DRAM PHY, which has been around for many years. The NAND PHY serves the same purpose as a physical interface to the memory. However, it requires a bypass mode for legacy devices as stipulated in the ONFI specification. This adds some complexity to the design and changes the interface from the controller to the PHY.
The I/O selection is another key component, especially given the ODT and the number of devices and system power requirements. The I/Os that provide the most flexibility are multi-function I/Os that provide programmable power and support both LVCMOS and DDR2. Although these I/Os require more connections, they will provide a very robust design that can be used for many configurations.
Conclusion
The performance and power improvements over the latest generation of ONFI and Toggle specifications are definitely game changing. New SSD price points and improvements in system performance are inevitable. These gains can be wasted in the implementation, however, especially given the software and hardware architectural variations used today. Hardware designers should pay attention to the following:
- I/O selection (vendor flexibility, ODT, power improvements, data reliability)
- PHY support for calibration
- Device interleaving (performance, channel efficiency, latency)
- Command pipeline (channel efficiency, latency, command efficiency)
- ECC non-blocking (at-speed Tx, Rx limited buffering, improves channel-to-channel latency)
- Partial page operation
The increase in page size and write performance is a growing trend as NAND process technology shrinks, and it works very nicely with the increase in bus speed provided by ONFI 3. Program erase cycle times are also increasing, which means devices are not accessible for longer periods of time, although the problem can be mitigated by more channels and fewer devices or by system software. ONFI 3 does change the requirements of your controller or your purchase of IP, which should have features to reduce the effects of these key items. Next-generation products using ONFI 3 will provide an improved user experience at a better price point than at any other time in the history of NAND Flash.