[Design View / Design Solution]
Overcome Barriers To Broad-Based SSD Adoption In The Enterprise
Designers must solve poor write endurance and write performance, high error rates, and security issues before enterprise server and laptop makers embrace solid-state drives on a wide scale.
Alex Naqvi
ED Online ID #21087
May 7, 2009
Copyright © 2006 Penton Media, Inc., All rights reserved. Printing of this document is for personal use only.
Reprints
At first glance, solid-state drives
(SSDs) appear to be a no-brainer
for makers of storage systems
for enterprise servers and
laptops. After all, SSDs promise higher
read/write performance, higher reliability,
and lower power consumption compared
to hard-disk drives (HDDs). But in practice,
SSD adoption has been held back
not only by a higher cost per gigabyte,
but also by real-world issues that prevent
them from achieving their performance
and reliability promises.
Last year saw the proliferation of SSD
product announcements in enterpriseclass
servers and laptops, beginning
with 32- and 64-Gbyte devices. In servers,
we’ve seen SSDs used in so-called
“Tiered Storage” systems, in which the
SSD acts as a higher-speed intermediary
between system RAM and hard-diskdrive
(HDD) storage. Fujitsu and other
vendors have also begun to use SSDs
in enterprise-class laptops, touting their
ruggedness and higher read performance.
With SSDs now in the marketplace, two
key trends have emerged:
- Declining costs: As NAND flash manufacturers
have continued to advance
technology and densities, the price per
gigabyte from NAND flash vendors has
dropped approximately three orders of
magnitude over the last decade, starting
from thousands of dollars in the 2000
timeframe to today’s commodity price
levels of around a $1 per gigabyte (for
MLC-based, or multi-level cell, technologies).
Continued price declines are
expected for years to come.
- Increasing performance: At the same
time, advances in flash memory as well
as techniques such as DRAM caching
are driving input/output operations per
second (IOPS) higher, with today’s fastest
SSDs sporting tens of thousands of
read IOPS.
SDD CHALLENGES
Despite these advances, most analysts
predict a very slow ramp-up toward
broad-based adoption of SSDs in enterprise-
class servers and laptops. One key
reason is the relatively high cost per gigabyte
for SSDs (compared with HDDs).
Today’s SSDs mainly use single-level
cell (SLC) memory due to its higher life
expectancy and reliability.
The cost of SLC memory is roughly
four times higher than MLC memory due
to two factors. First, MLC memory stores
two bits per cell and therefore provides
twice the storage per square millimeter
of silicon (the main cost of the memory).
Second, the volume of MLC is roughly
90% of all NAND flash, further increasing
the economies of scale in its production.
Unfortunately, MLC flash memory isn’t
yet deemed reliable or durable enough for
widespread enterprise use.
Nevertheless, MLC flash is clearly the
way forward due to its ability to rapidly
reduce the cost per gigabyte. Still, several
challenges must be overcome when using
MLC flash in its current implementation.
For example, MLC flash offers poor
write endurance. NAND flash memory
can only be written a certain number of
times to each block (or cell). SLC memory
generally sustains 100,000 program/
erase (P/E) cycles, while MLC memory
is generally 10 times less at 10,000
cycles. Once a block (or cell) is written
to its limit, the block starts to forget what
is stored.
Today’s SSDs are different from HDDs
when it comes to data storage. HDDs can
take the data directly from the host and
write it to the rotating media. In contrast,
SSDs can’t write a single bit of information
without first erasing and then rewriting
very large blocks of data at one time
(also referred to as P/E). In addition, to
maximize the life of the flash memory,
a technique to level the wear across all
blocks equally forces the SSD controller
to constantly move data around on the
flash memory.
These factors and other differences from
HDDs give rise to write amplification,
which can rise to a factor of 100 times the
amount of user data actually being stored.
Consequently, these factors also limit
the life expectancy of the SSD. Figure 1 shows the basic life expectancy formula
that affects all SSDs. Figure 2 shows the
details of the formula. A typical MLC
drive might have the characteristics shown
in Figure 3, where:
Capacity = 128 Gbytes
P/E cycles = 10,000
Write speed from the host = 125 Mbytes/s
Duty cycle (when the drive is accessed for
reads or writes) = 40% of the time
Read: write ratio (percentage of time an
access to the drive is a write, versus a
read) = 33% of the time
Write amplification (assuming a conservative
number) = 40
Clearly, 23 days is too short a lifespan to deploy in an enterprise
environment. To overcome the endurance problem, SSD
manufacturers use one or more of these five techniques:
- Combining MLC and SLC flash on the same device, which
extends endurance by storing more active data on the higherendurance
SLC memory, but still lowering the total cost by
using some MLC memory.
- Over-provisioning, which extends endurance by making more
flash available. For example, an SSD with twice as much actual
storage as its stated capacity would have twice the endurance
as a drive in which flash and capacity had a 1:1 ratio (no overprovisioning).
Of course, this over-provisioning would also
double the cost.
- DRAM caches, which extend endurance by aggregating some
writes before sending it to the flash memory and using it for
other housekeeping (rather than the flash memory). Naturally,
the DRAM also adds costs.
- Daily write limitations, which extend the life of the drive by
restricting the number of writes to the flash each day. For
instance, one vendor’s warranty specifies a limit of 20 Gbytes
per day written from the host, which can be reached in less than
five minutes on that same drive.
- Reduced warranties (less than five years), which account for lower
endurance by simply reducing the guaranteed life of the drive.
Continued on page 2
While random read performance in a typical MLC-based SSD
can get up to 10,000 to 20,000 IOPS, random write performance
is significantly less. Even a so-called “high-performance” SSD
today delivers roughly less than 1000 IOPS of write performance
(Fig. 4). This is generally caused by a high write amplification
factor and by a need to restrict writes to extend the drive’s endurance.
Typically, SSD makers address the write performance issue
with two methods.
One is by adding DRAM caches, described earlier. However,
this isn’t a long-term solution because it only speeds up writes
until the cache is full (the first few minutes of use, at best). In any
event, adding the gigabyte or more of DRAM cache that’s really
needed to impact performance would make the SSD too expensive
to sell.
The other method is to over-provision the drive. This gives the
SSD controller more room to manipulate the data and reduces
the amount of time the drive is doing housekeeping operations,
e.g., garbage collection (performed on blocks of data no longer
needed by the host).
NAND flash memory, like other memory chips, has naturally
occurring defects that render portions of die unusable. Most
SSDs provide error protection for up to one sector for each 1015
bits read. Assuming a 250-Mbyte/s read speed and a 40% operating
duty cycle, a sector error would result on average every 14.4
days based on the following formula:
1015/(8 × 250 Mbytes/sB × 40%) = 14.4 days
In contrast, high-performance HDDs offer error protection for
up to one sector for each 1016 bits read, but they’re transferring
data much more slowly than an SSD. Assuming a 120-Mbyte/s
read speed at the same operating duty cycle, a sector error
would result on average every 9.9 months based on the following
formula:
1016/(8 × 120 Mbytes/s × 40%) = 9.9 months
To address this problem, SSD manufacturers either reduce the
warranty period for their devices or they leave it to redundant
array of independent discs (RAID) logic in the host computer
system. Shorter warranties aren’t acceptable to most mainstream
users, and using host-based RAID causes a high number of
rebuilds and further reduces the SSD’s performance. And, of
course, in a notebook environment, RAID isn’t a good option.
One of the stealth issues impacting the use of SSDs is the lack
of encryption in a typical drive. For most products on the market
today, data can be recovered from the SSD by simply removing
the SSD cover and attaching a clip to the flash-memory chips—a
process far easier than trying to read the contents of a passwordprotected
HDD that’s been removed from its host. Enterprises
will demand much better security guarantees, though, before they use SSDs in large quantities, especially
in laptops.
REQUIREMENTS FOR BROAD SSD ADOPTION
The fundamental cost issue with today’s
SSDs can be largely overcome with the
use of MLC flash, but that flash must be
made reliable enough and deliver enough
performance to be practical for use in
enterprise servers and laptops. There are
four requirements for doing so:
- Better write endurance: The industry
must develop new techniques to reduce
the write amplification in MLC-based
SSDs (thereby increasing the endurance)
to meet the five-year expected
life of enterprise-class HDDs, and it
must do so without imposing daily write
limitations, shorter warranties, or costly
DRAM caches.
- Better write performance: MLC-based
SSDs should perform like HDDs—there
should be no difference between write
and read performance. This will also
require significantly reducing the write
amplification factor imposed by today’s
SSD technology.
- Lower error/defect rates: Error rates and
error correcting code (ECC) protection
for MLC-based SSDs must be better
than it is for today’s enterprise HDDs,
without over-provisioning or relying on
system-level RAID techniques.
- Full security: SSDs will need some form
of built-in encryption to prevent data
theft before enterprises will trust their
use in laptops.
- Reduced complexity, size, and costs:
SSD designs must eliminate DRAM and
combinations of SLC and MLC to reduce
packaging complexity, size, and costs.
The market is more than ready for practical
SSD storage devices, just as soon as
SSD manufacturers overcome the challenges
that limit performance, endurance,
and complexity and can offer fast and reliable
devices at reasonable prices. Advances
taking place today will address these
key issues, leading to a very bright future
for MLC-based SSDs.
ALEX NAQVI, president and CEO of
SandForce Inc., received an MSEE from
Oregon State University, Corvallis.
|