Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Design Application]
Optimize Memory Subsystem For Top Performance
A Better Understanding Of Memory Accesses Allows DSP Memory Subsystems To Be Better Matched To The DSP Chips.

Contributing Author  |   ED Online ID #7628  |   May 25, 1998


Designers are increasingly using multiple DSP chips in applications

that contain huge data sets--tens to hundreds of megabytes. Such applications

can no longer be economically implemented with static RAMs, most of

which typically have maximum capacities of 512 kbytes. Consequently,

many system designers must consider the use of dynamic RAMs (DRAMs)

to provide the larger memory space. Most DRAMs, however, are designed

for PC workstations. To optimize DRAM use in DSP applications, designers

must select the correct DRAM technology based on a different set of

goals.

In addition, most DSP chips are optimized for I/O handing, and that

typically means an interface optimized for use with SRAMs. As a result,

overall memory subsystem performance in a DSP application depends

on both the memory technology and the DSP chip's external interface.

Designers can pick from several DRAM architectures, each of which

brings a number of pros and cons for various DSP system implementations.

Thus, a better understanding of DRAM architectures and the DSP memory

interface will allow designers to better optimize the memory subsystem

for multiprocessor DSP applications.

On PCs, short read bursts for instruction cache-line fills have dominated

accesses to main memory. But the increasing use of object-oriented

languages and multitasking operating systems on PCs has lead to a

significant number of accesses that are dispersed throughout main

memory. This, in turn, has lead to an increasing emphasis on random-access

latency instead of solely on burst-access time for subsequent reads

to an open DRAM page.

Due to the emphasis on random-access latency, many PC manufacturers

were slow to replace EDO (extended data out) DRAMs with synchronous

DRAM (SDRAM) technology, which emphasizes burst accesses. In a typical

66-MHz memory implementation, SDRAM adds a cycle of latency on the

initial access in exchange for one less cycle on each of the subsequent

accesses. For a four-clock burst the net result is a two-cycle savings,

but that is only relevant if more than just the first fetch was needed.

Differing Emphasis
In a DSP system, the speed of instruction loads is generally not

the main concern. Signals are typically processed as vectors, which

are many times the length of the data cache line. The code for the

tight inner loops of signal processing is typically loaded once for

a long vector of data. The emphasis, therefore, is on the speed of

both the subsequent reads to the same cache line and for immediate

access to sequential memory locations.

The workhorse dynamic memories like standard fast-page mode (FPM)

DRAMs, EDO DRAMs, and burst-EDO DRAMs are basically the same, save

for some differences in the interface for reading data out at the

time of the column access strobe (CAS) signal. With FPM DRAMs, the

CAS signal causes data to be read directly from the sense amplifiers.

EDO DRAMs add a latch to the output of those sense amplifiers, which

allows the data-output buffers to stay on even after the rising edge

of CAS. The result is a faster cycle time from column address to column

address--up to a third faster than standard FPM DRAMs.

Burst-EDO DRAMs replace the output latch on the EDO DRAM with a register.

That adds an internal pipeline stage, which allows data within a burst

to come out quicker after the CAS signal for the second and subsequent

accesses in the burst. The trade-off is an extra pipeline stage for

the CAS signal on the first access, but this does not lower performance

because the first data access is limited by the row access strobe

(RAS) time, not the CAS time.

SDRAMs present more of an architectural change from FPM DRAMs than

do the EDO DRAM variations. From the DSP system designer's standpoint,

the important differences are that SDRAMs are synchronous and use

a clock input. An internal SDRAM divides the memory into multiple

banks, each with its own row decoder and sense amps. Current high-performance

SDRAMs use four internal memory banks, although earlier versions typically

used two banks (Fig. 1).

The multibank architecture eliminates gaps between data accesses

because data can be accessed from one bank while the others are precharging.

The SDRAMs buffer both inputs and outputs, and that does affect the

latency for the first access in a burst. The increased pipelining,

though, enables both quicker access to a full burst and operation

at higher frequencies, compared to EDO DRAMs.

As a result, one of the key performance issues becomes how the system

can deal with pipelined memory operations. The highest memory-to-processor

throughput is achieved by using the multiple accesses inherent in

the bursts of a cache line load. If that approach isn't used, the

access rate is limited by the speed of the address bus, which usually

has a duty cycle of only a percentage of the data bus. To reach the

full potential of pipelined memory systems, the pipeline should be

full as long as possible. Like a pump that needs priming, the data

through a pipelined memory system will incur startup latency after

any time the pipeline stalls. Accessing long vectors typically used

in signal processing data arrays helps keep the pipeline full.

Match Latency To Pipeline
When evaluating the various memory technologies for use in DSP systems,

the designer should match each technology to the processor's capabilities.

That is, the latency of the memory subsystem should be matched to

the pipeline capabilities of the processor. The more pipelining in

the processor, the higher the latency it can tolerate in the memory

and the memory controller without affecting throughput.


<-- prev. page     [1] 2 3     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?



POST YOUR COMMENTS HERE
Name:

Email:
Your Comments:

Enter the text from the image below


Please refresh the page if you have trouble reading this text.

Search Electronic Design
     
  
 
Web Seminar
Sponsored By:
Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
Speakers: 
Date: 07/01/08
Register: 

Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources