[Design View / Design Solution]
Interfacing FPGAs To High-Speed DRAMs Puts Designers To The Test
High-speed external-memory interfaces need tight timing constraints, DQ-DQS phase management, good signal integrity, and proper board designs.
FPGAs are finding greater use as core components in systems for networking, communications, storage, and high-performance computing applications requiring complex data processing.
So, it is now mandatory that FPGA vendors support high-speed, external memory interfaces. Recognizing this, today's FPGAs offer specialized features that allow them to interface directly with a variety of high-performance memory devices. We'll focus here on the design of high-speed DRAM-to-FPGA interfaces. This article describes the challenges and barriers involved with these interfaces and highlights solutions to address these obstacles.
Rest assured that designing high-speed external-memory interfaces is no simple task. Synchronous DRAMs, for example, have evolved into high-performance, high-density memories and are now being used in a host of applications. The latest DRAM memoriesDDR SDRAM, DDR2, and RLDRAM IIsupport frequencies ranging from 133 MHz (266 Mbits/s) to 400 MHz (800 Mbits/s).
Thus, designers are often confronted with the challenges of DQ-DQS phase management, tight timing constraints, signal-integrity issues, and simultaneously switching output (SSO) noise. Plus, certain board-design issues could prolong design cycles or force them to accept reduced performance. To make matters worse, all of these hurdles become more pronounced at high frequencies.
DQ-DQS PHASE-RELATIONSHIP MANAGEMENT DDR SDRAMs rely on a data strobe signal (DQS) to achieve high-speed operation. DQS is a non-continuous-running strobe used for clocking data on the DQ lines. It's transmitted externally along with the data signals (DQ) to ensure that they track each other with temperature and voltage changes. The DDR SDRAM uses on-chip delay-locked loops (DLLs) to output DQS relative to the corresponding DQs.
The phase relationship between the DQ and DQS signals is important for DDR SDRAM and DDR2 interfaces When writing to the DRAM, the memory controller in the FPGA must generate a DQS signal that's center-aligned within the DQ data signals. When reading from the memory device, the DQS coming into the FPGA is edge-aligned with respect to the DQ signals (Fig. 1).
Upon receiving the DQS signal, the memory controller must phase-shift it to be center-aligned with the DQ signals. The amount of time that the DQS must be delayed is governed by board-induced skew between the DQS and DQ groups, the resulting data-valid window at the controller, and the sampling-window requirements at the controller input registers.
This is one of the most challenging requirements for DRAM controller designs. Memory-interface designers can employ one of several techniques to align the DQS to the center of the data-valid windowboard trace delay on DQS, on-chip trace delay on DQS, on-chip DLLs, or phase-locked loops (PLLs).
Board Trace Delay On DQS: This is the traditional approach for aligning DQS and a related DQ group. But the technique is inefficient and proves to be a performance barrier in sophisticated systems for the following reasons:
Using the 400-Mbit/s case as an example, the nominal delay for DQS with respect to DQ is 1.25 ns (assuming that the required phase shift for center-aligning the DQS signal with the DQ signal is 90°). To achieve this delay, approximately 7 to 8 in. of trace length must be added to the DQS line (based on an approximate delay of 160 ps/in. for an FR4 laminate Microstrip with a 50-Ω characteristic impedance). Not only does this complicate board layout, it also can result in increased board cost if extra signal layers are required. This is especially true when interfacing with DIMMs, since routing the additional length needed for each DQS signal can be difficult.
The required delay and resulting trace length must be accurately predetermined. This locks the interface to a specific frequency, leaving designers little flexibility. Any changes in interface frequency would require laying out the board again.
Increased trace length also results in higher loss on the DQS line. Thus, rise and fall times are compromised, limiting the maximum attainable frequency.
On-Chip Delay Elements: This approach uses a number of delay elements connected in series to achieve a predetermined delay. The delay, and corresponding number of delay elements required to achieve it, must be calculated based on the frequency of operation and the right number of elements for each frequency bin. Designers can then use varying design techniques, employing a combination of coarse and fine delays to further fine-tune to the desired value. However, delay elements are inherently susceptible to process, voltage, and temperature (PVT) variations, which can be up to ±40%. This variation in delay decreases the effective sampling window for the controller, and it doesn't scale with frequency. The limitation of this approach then makes it useful only for lower frequencies (133 MHz and below).
On-Chip DLLs: To solve the design issues in the above two implementations, designers can utilize on-chip DLLs to introduce delay onto the DQS lines. By using a reference clock at the desired interface frequency and basing the required delay as a percent of that clock period, the DLLs can then pick the right number of delay elements to achieve the desired delay.
For example, Altera uses this method to achieve the 90° DQS phase-shift during the read operation. These FPGAs feature on-chip DQS phase-shift circuitry and dedicated DQS-DQ I/O groups at the top and bottom of the chip. When not interfacing with external memory, these pins can be used as general-purpose I/Os.
However, when interfacing with external memories such as DDR SDRAM, these pins must be used for DQS. Each DQS signal is associated with a group of DQ signals. DQS:DQ group ratios can be either 1:4, 1:8, 1:16, 1:18, 1:32, or 1:36 when using Stratix II FPGAs and 1:8, 1:16, or 1:32 with Stratix FPGAs.
The dedicated DQS pins tie internally to a set of delay elements before being routed to the I/O input registers. The cumulative delay of these elements is controlled by the DQS phase-shift circuitry. The dedicated DQS phase-shift circuitry, which consists of a DLL and control circuitry, enables automatic, on-chip delay insertion on incoming DQS signals during a read operation. This DQS phase-shift circuitry uses a frequency reference to generate control signals for the delay elements on each of the dedicated DQS pins, allowing it to compensate for PVT variations. Further, to minimize channel-to-channel skew, the phase-shifted DQS signal is transferred to the DQ I/O elements (IOE) via a balanced clock network.