What you’ll learn:
- Why NAND flash is often misunderstood.
- Details about some of the most commonly held NAND flash myths.
- Insights into important design considerations for NAND flash, including temperature range, density, power etc.
Invented by KIOXIA back in 1987, NAND flash has fundamentally changed the way we live. Yet, as pervasive as this important technology is, it’s still often misunderstood.
Totally understandable. From temperature range to density to power, the design considerations for NAND flash are many and varied. The depth and breadth of available flash-based products for next-generation storage applications is staggering, and design engineers have questions. We’re here to help.
Keep reading as we dispel some of the most commonly held myths about the transformative technology.
1. QLC flash will replace TLC flash, which replaced MLC flash, which replaced SLC flash.
QLC (quad-level-cell) flash is increasingly becoming visible in the storage industry as solid-state disk (SSD) capacities become bigger and cheaper. While QLC flash is the most economical, TLC (triple-level-cell) offers better performance and reliability to cover a variety of the mainstream storage applications. Both QLC and TLC will continue to coexist, as they’re each well-suited to specific applications.
For example, QLC is better for read-intensive applications while TLC is better for higher-performance mixed workload and write-intensive applications. As long as the disparities between QLC and TLC performance and reliability exist, QLC is unlikely to replace TLC entirely.
2. While getting denser and cheaper, 3D flash isn’t getting faster.
Generation over generation, 3D flash continues to increase in density and performance while becoming less expensive. New design strategies and features are being implemented to boost performance. For example, increasing the number of planes and new features such as virtual multi-LUN (VML) read, where each plane can be read independently at any time, increases random read IOPS. In addition, new standard specifications such as Toggle DDR5 push NAND interface speed up to 2.4 Gb/s.
3. Data will always live on hard-disk drives because they have the lowest cost per bit.
Usually, the technology offering the lowest cost wins in the long term. Hard drives continue to offer the lowest cost per stored bit, but solid-state storage is closing the gap due to data retrieval time.
SSDs will steadily grow in market share due to increasing density, lower cost per bit, and performance per capacity. This is an important metric because as drive capacity grows, but read and write performance can’t scale, then total IOPS per gigabyte worsens and the number of total users per drive becomes bottlenecked.
4. Serial interfaces are replacing parallel interfaces at all levels of design.
For I/O interfaces, this has definitely been a long-term trend: PATA to SATA, PCI to PCI Express, eMMC (embedded MultiMediaCard) to UFS (Universal Flash Storage). But for raw memory interfaces, it’s more of a mixed bag.
While it is true that parallel NOR flash has been mostly replaced with serial NOR flash, DRAM and NAND flash have maintained their multibit wide buses. The latency and cost of implementing very-high-speed I/O circuits on commodity chips will most likely prevent the adoption of high-speed serial on DRAM or NAND for the foreseeable future.
5. Managed NAND (eMMC, UFS, PCIe SSDs) is always a better solution than raw NAND.
NAND flash memory with a controller (managed NAND) continues to be the easiest-to-use solid-state storage device because it’s a complete non-volatile storage system. Using raw NAND flash requires management done by the host processor or controller chip: logical-to-physical block translation, bad-block managements, and error correction. Therefore, managed NAND will be easier to use since it presents itself as an ideal block storage device to the system.
However, raw NAND will have its place. For example, it would not make sense to build an SSD or flash array out of managed NAND devices due to the increased cost and lower performance thanks to increased latency from cascaded controllers.
Managed NAND is best used as a black-box storage subsystem. However, sometimes you need the flexibility to implement your own storage subsystem for performance or cost reasons, in which case utilizing raw NAND memory with your own architecture and firmware is the only way to go.
6. SD and microSD cards will lose their dominance as a removable storage form factor in two short years.
The SD card (along with microSD) has long been the most widely used memory form factor in the world. Recently, demand from high-resolution recordings and 5G mobile applications requiring higher bandwidth has surfaced, and new form factors supporting PCIe and NVMe are being developed. These new form factors, such as KIOXIA’s XFMEXPRESS, are now close to being standardized, meaning that much faster performance than the legacy SD interface is close at hand.
All that being said, the SD/microSD card isn’t going away any time soon. There’s simply no other form factor more accepted or as tiny.
7. ECC performed internally in the memory is more efficient than the ECC handled by the host controller.
Error correcting code (ECC) is a form of forward error correction in which check bits are computed and added to the data the user wishes to store. Then, the user data and ECC are stored on the storage medium; in this case, a NAND flash memory chip. Error bits that occur are detected upon readout and can be corrected depending on the strength of the ECC and the number of error bits. However, all of this takes processing power—first to compute the check bits before storage, and second to correct the error bits upon readout.
But where should the ECC be done? On the memory die, or on a controller chip? The answer is: It depends. If the number of NAND dies required is one, then having ECC circuitry on the NAND die itself is convenient because it makes the NAND look error-free.
This is very useful when connecting to smaller microcontrollers powering smart devices and IoT devices, because many of these processors lack the hardware for an ECC engine and doing ECC in software is relatively slow and inefficient. In addition, a NAND with built-in ECC enables the use of a newer NAND lithography with older processors that don’t support the higher ECC requirements of a newer, smaller NAND geometry.
On the other hand, if the total number of NAND dies required in the application is higher, it makes economic sense to not burden every NAND die with the overhead of ECC circuitry and simply put the ECC engine in the controller. Higher processing speed is usually possible since the controller will be designed using a logic process versus a memory process. In an SSD, with one controller connected to many NAND die, the overall design will be less expensive if there’s one ECC engine in the controller.
8. UFS isn’t much faster than eMMC.
Actually, it is.
While it's true that it's difficult to operate at the maximum interface speed for both eMMC and UFS due to certain bottlenecks, the actual performance differences are significant. The maximum interface speed for UFS Ver3.1 at 2320 MB/s is about six times that of eMMC Ver5.1 at 400 MB/s. And though actual performance seen for eMMC and UFS parts fall below this, there are still big differences.
For example, we may see eMMC sequential read speeds of around 325 MB/s, while that of UFS is over 2000 MB/s. Moreover, large differences in performance are found for sequential write and random read and write as well.
9. The best design-in option is always the latest version of UFS over eMMC.
While it's true that UFS v3.1 provides the best performance for eMMC/UFS, there are a number of factors to consider that might not make it a viable option. The memory density needed may be a big factor. To take advantage of UFS's faster interface, multiple die are required within the device for interleaving. For this reason, UFS typically isn’t supported for densities less than 32 GB. This means we can expect applications that only need, say, 4, 8, or 16 GB to continue to use eMMC.
This plays a role in the version of UFS supported at certain densities as well. At v2.1, 32- and 64-GB UFS typically continue to be supported. That’s because by the time the v3.0/3.1 interface emerged, the minimum 3D die densities available were too large to enable multiple die interleaving at these densities to take advantage of the faster interface. In fact, this would result in degraded performance if the newer 3D generation die were used to construct the 32- or 64-GB UFS devices due to having less die to interleave.
As the layer count for 3D flash die increases each generation, the minimum die density increases for that generation, too. What interface the SoC supports is another factor, among others.
10. eMMC/UFS endurance can be specified in terabytes written (TBW).
TBW is the total amount of terabytes that can reliably be written to the flash device over its lifetime. This is a popular endurance specification for SSDs, and some entities are also starting to specify or request TBW as an endurance capability for eMMC and UFS.
However, TBW as a specification can’t accurately be relied on to understand how many terabytes are actually able to be written to the device. That’s because this specification neglects to account for write amplification, which degrades the number of writes. And write amplification varies depending on the access pattern from the host processor.
A designer should work with the flash supplier to understand how the access pattern for their application's use case impacts the true endurance capability of the flash device, and how to potentially optimize the access pattern to the flash device to extend its longevity.
11. Since eMMC and UFS are JEDEC standards, performance and reliability are about the same between different vendors’ parts.
While managed flash (eMMC/UFS) will take care of fundamental tasks, the actual implementation can vary significantly from vendor to vendor. For instance, the frequency and algorithm by which garbage collection is performed needs to be optimized, since each operation temporarily reduces performance and adds write amplification. How wear leveling is performed, for example when the device is partitioned between native and enhanced mode, is another factor impacting the life of the device.
There are often important differences in performance and reliability, and tradeoffs need to be considered when comparing different vendors’ eMMC and UFS devices. Typically, suppliers who develop their own eMMC or UFS controllers in-house have better results, because they’re able to optimize their controller to work with their latest generation of flash.