Flash SSD Endurance and Reliability: Four Influential Factors
Have you ever wondered what makes the difference between the lifetime of traditional data storage solutions and state-of-the-art technology? Or why some NAND flash solid-state drives (SSDs) live up to their promised lifespan and others don’t? The differences all come down to a few primary elements, which we’ll explore in this article.
Four Indicators of SSD Excellence
Finding an SSD that suits your needs is like shopping for any other product—you have a wide range of options that run the gamut from inexpensive (but lesser quality) to very pricey (with excellent quality). It can be a challenge to find the right balance of quality and cost. When evaluating your options, there are four things to consider: performance, power utility, endurance, and reliability.
Performance represents the efficiency of reads and writes to the drive. It’s typically defined by the drive’s throughput, latency, and consistency. Sometimes people speak of performance as speed, and while that’s partially accurate it’s not the whole story.
Power utility represents how the SSD consumes power. SSDs don’t use a lot of power, but the amount can vary depending on a range of factors, such as form factor, type of NAND flash (SLC vs. TLC, for instance), the SSD controller and its capabilities and features, the number of channels in the module, and how often data is read or written to storage.
Endurance is the factor that most closely aligns with the drive’s lifespan. It measures an SSD’s longevity in several ways:
- Terabytes written (TBW), or how many terabytes of data you can save to the drive before it reaches the end of its life.
- Drive writes per day (DWPD), or how many times you can fill the entire drive per day within a set amount of time determined by the manufacturer, such as three to five years.
- Program/erase (P/E) cycles, or how many times you can write and erase data in a cell before it begins to fail.
Reliability, or overall trustworthiness of the SSD, is more difficult to quantify. However, manufacturers typically use similar metrics here as they do for endurance: TBW and DWPD. A recent study tested SSD reliability and presented results using a metric of annual failure rate (AFR). Reliability also addresses data retention, bit errors, and other similar issues.
The four factors listed above are critical elements of any SSD, and as such, they can directly affect each other. For instance, an SSD’s workload affects endurance. Power utility and accompanied overheating can impact endurance and performance, and so on.
When it comes to SSD lifespan, though, the two most important elements are endurance and reliability. Let’s take a closer look at them.
NAND Flash Endurance
As mentioned above, the number of program/erase (P/E) cycles a NAND flash module can handle essentially represents its lifespan. To better understand P/E cycles, it’s important to know how NAND flash processes and stores data.
SSDs store information in containers, or flash-memory cells. Data is written to blocks in each cell by electrically charging them. Through a physical phenomenon called Fowler-Nordheim Tunneling (F-N Tunneling), data in the form of electrons can be “squeezed” into and out of the insulation layer of memory cells, which can be thought of as the lid on top of each container.
When you want to change a piece of information in a memory cell that’s filled up, the drive can’t simply write the edited data over the original data already stored in a block. It must completely erase the original data and then write the updated information to the now-empty block. This process is called a program/erase, or P/E, cycle.
Each SSD has a limited number of P/E cycles. Every cycle degrades the memory cells a little bit by eroding the insulation layer of the cell. That damage builds up over time until the cells become too worn down and can no longer hold electrons.
Most SSDs can handle anywhere from several thousand to tens of thousands of P/E cycles. A drive’s maximum P/E cycle count can differ depending on the drive’s design and other factors such as the bit density (see figure). Newer technology like quad-level-cell (QLC) NAND offers higher bit density (four bits per cell, as opposed to SLC’s single bit per cell), but the P/E cycle count of these drives is lower than single-level-cell (SLC) drives. That’s because programming procedures become exponentially more difficult with high-bit-density designs.
With real-world usage across applications, average consumers aren’t expected to run into endurance problems within the typical warranty period of three to five years. However, when it comes to enterprise-grade solutions or those accommodating ultra-high demand for endurance, SSD vendors either opt for NAND flash with high-grade endurance ratings or supply the module with generous overprovisioned NAND flash to allow for higher tolerance for extreme workloads.
Reliability Issues in SSDs
While most NAND flash-memory cells live up to their promised endurance limit, it’s possible a cell still may suffer data losses before its lifespan is complete. One of the root causes of such a phenomenon is that data can leak out over time (in the form of electrons) through the memory cell’s insulation layer. This leakage is a type of reliability malfunction called a data-retention issue.
Data retention is a measure of how long an SSD can store its data without being corrupted. It’s affected most significantly by the SSD’s temperature and percentage of P/E cycles used.
For example, data retention for a multilevel-cell (MLC) SSD might be expressed as “2 years with up to 3% P/E cycles at 85°C.” That means that the SSD can be expected to retain data for two years if it’s stored at 85 degrees Celsius and is only 3% through its maximum P/E cycle count. Data-retention issues are rather critical because they can occur anytime, even within unused memory cells.
Aside from the gradual wearing down over time, data-retention issues can arise from the simple act of reading from memory cells. “Read disturb” sometimes happens when cells become unstable due to the reading operations being conducted on neighboring cells.
The way the electrical wiring is designed in NAND flash, it’s inevitable that NAND flash cells will be slightly triggered when neighboring cells adjacent to them are being read repeatedly within a short period of time. Such chained triggering effects can cause a glitch in the cell’s programming and affect how it stores data—and result in data loss or a change in the data.
To avoid such side effects of read operations, firmware designers can program the drive so that the time interval between consecutive reads of each memory cell is longer and more controlled.
Conclusion
As SSDs and other NAND flash devices continue to be the popular choice for modern storage, it’s important to learn about what’s behind the technology. While NAND flash has limited life and some flaws that can affect endurance and reliability, ultimately it’s up to the storage-device controller IC and respective firmware to complement products that are fit for today’s needs.
Since various factors can dictate the lifespan of today’s SSDs, leveraging SSDs that are programmed for excellent endurance and reliability can resolve many potential issues with advanced designs in hardware and firmware.