Premium Content

New Signal Chain Resources from Texas Instruments:

Overcome Traditional Memory-Speed Barriers With Embedded DRAM

Engineers can now turbocharge their designs without having to sacrifice DRAM's high bit density to run at near-SRAM speeds.

Date Posted: June 18, 2001 12:00 AM

Optimized Embedded DRAM
With embedded DRAM processes, there is an opportunity to radically improve random cycle time and displace SRAM in many applications. In general, DRAM memory arrays are made as large as possible to minimize die area and achieve the lowest cost. The number of memory cells associated with each bitline pair in a typical commodity DRAM array is usually 256 or 512, as determined by the cell-capacitance to bitline-capacitance ratio, to provide an adequate signal for sensing (Fig. 3).

In commodity DRAMs, there typically are 2048 to 4096 bitline pairs in an array, minimizing the overhead of the WL driver. Even though wordlines are strapped in metal, the RC time constant can be more than 5 ns. As a result, several dead clock cycles have been required between the activation and read commands in SDRAM to allow for the wordline rise time and cell signal propagation to the sense amplifier. Additional cycles are required at the end of the cycle between the precharge command and a subsequent activation command to allow for the RC delay associated with the wordline falling edge and bitline equalization.

As mentioned earlier, today's embedded DRAM processes provide high-performance logic devices (in addition to the slower DRAM transistors) that can be used in all DRAM circuits with the exception of the memory cell itself. This greatly speeds up the datapath. With only a modest increase in area, the arrays can be fragmented to shorten bitlines and wordlines, substantially reducing RC delays.

Figure 4 shows a speed-optimized DRAM array structure that uses sub-wordline decoders to achieve shorter wordlines and reduced wordline RC delay. Simply cutting the length of the wordline in half reduces both resistance and capacitance by a factor of two, resulting in a fourfold improvement in RC delay. Reducing the length of the bitlines provides a similar improvement in bitline RC delay, plus the added benefit of an increased signal to the sense amplifiers, which further improves performance.

However, there is a cost associated with this increased array fragmentation. The fragmentation has doubled or quadrupled the number of bitline sense amplifiers for a given amount of memory and added subwordline decoders that did not exist in the standard array. Typical commodity DRAMs have a cell efficiency (the ratio of memory cells to total chip area) in the range of 50% to 60%. With the architecture optimized for fast access, a cell efficiency of 35% can be achieved (Fig. 4 again). Yet there is still an enormous area advantage over conventional 6T SRAM, which was the only alternative for fast random-access applications.

Finally, breaking free from the constraints of commodity memory standards, the datapath pipeline can be fully optimized for speed. Figure 5 shows the external timing and internal signals for Fast DRAM. The page mode no longer exists. This eliminates the separate activation and precharge commands, leaving only read and write commands. Activation and precharge are performed automatically as part of individual read or write commands. The internal DRAM core completes a full row cycle within one cycle of the external clock. Read data is available at the output pins with a latency of 2. Both row and column addresses are provided at the same time as the read or write command.

Bitline precharge occurs at the beginning of the cycle while the row address is being decoded. Short wordlines and bitlines with minimum RC delay let data sensing occur within a fraction of the external clock cycle. By the time the next rising edge of the clock arrives, the data at the bitline sense amplifier has been sampled by the output pipeline. This enables the array to be precharged in anticipation of the next access, achieving full random access on every clock cycle.

Applying these techniques to a 0.25-µm embedded DRAM proc-ess, Mosaid was able to achieve 160-MHz random-access operation (6.25-ns tRC) with fully pipelined operation. With a die area penalty of less than 50% over conventional embedded DRAM architectures, but still more than five times the bit density of SRAM, fast embedded DRAM can perform random access cycles at full ASIC internal clock rates.

As 0.18- and 0.13-µm embedded DRAM processes become available, 200- to 250-MHz operation will be achievable with only a modest area increase over conventional DRAM architectures. Fast embedded DRAM opens up a whole new range of SoC applications that demand both high density and high speed.

Optimized Embedded DRAM
With embedded DRAM processes, there is an opportunity to radically improve random cycle time and displace SRAM in many applications. In general, DRAM memory arrays are made as large as possible to minimize die area and achieve the lowest cost. The number of memory cells associated with each bitline pair in a typical commodity DRAM array is usually 256 or 512, as determined by the cell-capacitance to bitline-capacitance ratio, to provide an adequate signal for sensing (Fig. 3).

In commodity DRAMs, there typically are 2048 to 4096 bitline pairs in an array, minimizing the overhead of the WL driver. Even though wordlines are strapped in metal, the RC time constant can be more than 5 ns. As a result, several dead clock cycles have been required between the activation and read commands in SDRAM to allow for the wordline rise time and cell signal propagation to the sense amplifier. Additional cycles are required at the end of the cycle between the precharge command and a subsequent activation command to allow for the RC delay associated with the wordline falling edge and bitline equalization.

As mentioned earlier, today's embedded DRAM processes provide high-performance logic devices (in addition to the slower DRAM transistors) that can be used in all DRAM circuits with the exception of the memory cell itself. This greatly speeds up the datapath. With only a modest increase in area, the arrays can be fragmented to shorten bitlines and wordlines, substantially reducing RC delays.

Figure 4 shows a speed-optimized DRAM array structure that uses sub-wordline decoders to achieve shorter wordlines and reduced wordline RC delay. Simply cutting the length of the wordline in half reduces both resistance and capacitance by a factor of two, resulting in a fourfold improvement in RC delay. Reducing the length of the bitlines provides a similar improvement in bitline RC delay, plus the added benefit of an increased signal to the sense amplifiers, which further improves performance.

However, there is a cost associated with this increased array fragmentation. The fragmentation has doubled or quadrupled the number of bitline sense amplifiers for a given amount of memory and added subwordline decoders that did not exist in the standard array. Typical commodity DRAMs have a cell efficiency (the ratio of memory cells to total chip area) in the range of 50% to 60%. With the architecture optimized for fast access, a cell efficiency of 35% can be achieved (Fig. 4 again). Yet there is still an enormous area advantage over conventional 6T SRAM, which was the only alternative for fast random-access applications.

Finally, breaking free from the constraints of commodity memory standards, the datapath pipeline can be fully optimized for speed. Figure 5 shows the external timing and internal signals for Fast DRAM. The page mode no longer exists. This eliminates the separate activation and precharge commands, leaving only read and write commands. Activation and precharge are performed automatically as part of individual read or write commands. The internal DRAM core completes a full row cycle within one cycle of the external clock. Read data is available at the output pins with a latency of 2. Both row and column addresses are provided at the same time as the read or write command.

Bitline precharge occurs at the beginning of the cycle while the row address is being decoded. Short wordlines and bitlines with minimum RC delay let data sensing occur within a fraction of the external clock cycle. By the time the next rising edge of the clock arrives, the data at the bitline sense amplifier has been sampled by the output pipeline. This enables the array to be precharged in anticipation of the next access, achieving full random access on every clock cycle.

Applying these techniques to a 0.25-µm embedded DRAM proc-ess, Mosaid was able to achieve 160-MHz random-access operation (6.25-ns tRC) with fully pipelined operation. With a die area penalty of less than 50% over conventional embedded DRAM architectures, but still more than five times the bit density of SRAM, fast embedded DRAM can perform random access cycles at full ASIC internal clock rates.

As 0.18- and 0.13-µm embedded DRAM processes become available, 200- to 250-MHz operation will be achievable with only a modest area increase over conventional DRAM architectures. Fast embedded DRAM opens up a whole new range of SoC applications that demand both high density and high speed.

Part Inventory
Go
powered by:
 

 
You must log on before posting a comment.

Are you a new visitor? Register Here
    There are no comments to display. Be the first one!