[Design Application]
Embedded Memories Are The Key To Unleashing The Power Of SoC Designs
Smaller device geometries and improved compilers let designers integrate powerful combinations of memory and logic.
The semiconductor industry continues to validate Gordon Moore's original prediction that device densities and speeds will double every 18 months. With process technologies pushing well into the deep-submicron arena, they have finally reached a point where IC designers can integrate significant densities of memory and logic together on the same chip. In doing so, they have ushered in the system-on-a-chip (SoC) era.
Integrating memory on-chip isn't a new concept, of course. Microcontroller designers have done it for years. But the emergence of multimillion-gate ASIC designs has increased the demand for a wide range of embedded memory options and applications. Over the past few years, the prospects for embedding DRAM and flash memory on-chip have received a great deal of attention. But developing a single process to maximize the performance of both memory and logic circuits has been an ongoing struggle.
In the highly anticipated embedded DRAM arena, for instance, developers have faced an inherent contradiction between the demand to maximize device and interconnect performance for logic circuits, and the need to maximize retention time and reduce cost for DRAM circuits. Integrating the two technologies into a single process capable of eliminating the expensive additional mask layers has proven more difficult than expected.
IC designers haven't faced the same process constraints with embedded SRAM. Used for many years to accelerate performance in high-end network routers and switches, embedded SRAM doesn't require extra masking steps. This is because it's based on the same process used in logic designs. Moreover, while embedded SRAM employs a larger cell size than DRAM, new technologies are emerging to help boost embedded SRAM density. So, despite recent advances in the development of embedded DRAM and flash memory processes, embedded SRAM remains the workhorse of ASIC memory designs.
The key to embedded SRAM performance is memory compiler design. As process technologies have matured from one generation to the next, though, compiler designers have faced unprecedented challenges. A memory compiler works on the basic principle that memory has a regular structure. Memories are built from four basic building blocks: the memory array, predecoder, decoder, and the column select and I/O section. The memory array is constructed by using the same memory core cell (Fig. 1). The other three building blocks are also erected from a basic leaf cell. A compiler creates a memory design by using instances of the different leaf-cell types to make up the desired memory width and depth.
To address the ever-increasing demands of ASIC designs, memory compiler developers must constantly strive to improve density, performance, and power as technology moves from one generation to the next. Top performance in all of these areas is achieved when the leaf cell and memory core cell are optimized for both process technology and memory-size range. For example, LSI Logic Corp. has developed SRAM compilers optimized for different memory-size ranges and memory core cells that combine the highest driving capability and the smallest size for G12 0.18-µm technology.
Over time, new challenges have also forced compiler architects to adapt. Compiler designs are now optimized to meet the demands for a wide range of applications. Segmented or block architectures are deployed to improve performance and power consumption. SoC cores are designed with tightly coupled memories to overcome the processor-to-memory bottleneck.
As more complex system functions were integrated onto a single chip, the memory compiler had to evolve to embrace more memory subsystem features as well. Today's embedded memory designs often feature multiport, synchronous or asynchronous operation, and stringent power control (Fig. 2).
Perhaps, though, the greatest challenge facing the embedded SRAM developer is how to satisfy the growing demand to embed ever-larger memories on-chip. Over the past few years, the amount of embedded memory available to ASIC designers has rapidly grown from 1 Mbit in a 0.35-µm process technology to 2.5 Mbits in 0.25-µm processes, and more recently, to 6- to 8-Mbits in 0.18-µm technology. That, in turn, has dramatically complicated the test process. To fulfill those requirements, current embedded memories typically feature built-in scan latches and a scan path, as well as a built-in self-test (BIST) logic wrapper to perform self test.
Maintaining reasonable yields is critical to the development of cost-effective embedded memory designs. Accordingly, many ASIC manufacturers have gone a step beyond and integrated redundant rows and columns into their memory structures.
Some employ soft built-in self-repair (BISR) schemes in which the device identifies a bad row in a self-diagnostic routine and uses address mapping logic to automatically translate it to a good address space. While soft BISR strategies can improve yield and reduce cost, they present some significant limitations.
Usually, a soft BISR solution can add up to 1.5 ns to address setup time, posing a significant liability for high-performance designs. Developers must be aware of this liability and restrict soft BISR use to applications that can tolerate this additional time penalty. Plus, repair in soft BISR is a function of the power-on condition. Repeatability also is an issue. Finally, designers must compensate for power-on time. A soft BISR solution takes approximately 2 ms to run BIST, identify faulty rows, and repair them before the memory can be used.
Recently, some ASIC vendors migrated to a more sophisticated hard BISR concept. This is similar to those employed in standard DRAM parts using a fuse link and laser system to implement repair in manufacturing. In these schemes, an algorithm automatically burns in a field that directs the fuse box to connect to a good row address as soon as the bad address arrives (Fig. 3). In a hard BISR scheme, the fuse register output is linked to the scan output of a faulty location analysis and repair execution (FLARE) unit. The only time enable is high is when the fuse data is read into the FLARE at power-up. A BISR operation mode loads information directly from the fuse bank into the FLARE register, so remapping can take place without rerunning the BIST.