[Design Application]
Improve Backplane Performance With Source-Synchronous Designs
The source-synchronous interface is an ideal upgrade for passive backplanes and on-card buses.
System performance goals are in a constant march toward higher levels of throughput and bandwidth. This progression is forcing designers to leave the comfort of traditional synchronous interfaces behind. These traditional designs mainly suffer from purely physical performance limitations. This interface works in "absolute" time. All agents in a synchronous interface take marching orders from a dedicated clock source distributed via equal length traces to minimize skew across the system.
Current levels of integration and IC process capability have significantly reduced the timing delays associated with interface ICs as well as skew from clock drivers. Still, the transport delay cannot be eliminated or ignored. It just takes time to move signals from one agent to another. Because the transport delay or flight time of a signal limits the system's operating frequency, designers have resorted to wider parallel buses to meet overall system bandwidth requirements. Beyond a certain point, however, the pain and cost associated with increasingly wider widths overshadows the performance gained. Ultimately, alternative solutions must be considered.
One possible solution is to incorporate a source-synchronous interface onto a traditional passive-synchronous backplane interface. Due to the common parallel architecture, this interface is an ideal upgrade from the traditional synchronous design. It improves the throughput of many passive backplanes and on-card buses. All interface architectures are bound to have tradeoffs. But in a source-synchronous design, the advantage of additional system bandwidth far outweighs the cost of implementation.
Specific source-synchronous implementations are found in areas where bus throughput is critical to the overall system performance. Two good examples are double data rate (DDR) and Rambus memory modules. Both employ variations of a source-synchronous architecture, thereby improving the bandwidth of the memory subsystem in computers.
The definition of a source-synchronous system is a system that uses a strobe or clock signal generated by the address/data signal source to latch or clock the address/data signals at the receiving agent. Implementing a self-timed strobe at the receiver eliminates the flight time variable from system timing equations. Eliminating flight time allows the designer to maximize the potential bandwidth of any interface technology by increasing the operating frequency. Because interface signal timing is now working in "relative" time, the global skew requirements of a system clock have likely been reduced.
The Synchronous Interface By studying a traditional synchronous design, we can establish a baseline performance level for a given interface (Fig. 1). Our study includes clock distribution, signal routing, and a typical solution. The solution results are shown in a graphical or visual form to allow a better understanding of all the variables and the degree to which they affect overall system timing and performance.
A centrally located clock source uses matched trace delay to generate and distribute multiple clock signals. Ideally, these signals arrive at all synchronous elements or card edges at the exact same instant. In most systems, an additional level of clock distribution is undertaken at the card level. This second level is often handled by some form of phase-locked loop (PLL) so that all cards, independent of on-card clock requirements, present an equal load to the central clock. Therefore, one source of clock skew is eliminated.
Unlike the clock lines, data, address, and control signals are typically routed in a multidrop or daisy-chain arrangement. This topology allows for varied card-to-card routing delays based on the relative card positions. In an unterminated backplane design, like CompactPCI, the designer needs to account for an additional "settling time" that's equal to the maximum flight time of the backplane interface.
In an effort to improve the signal integrity and performance of multidrop interfaces like VME, an alternative routing structure may be employed. This structure, commonly referred to as a star configuration, routes signals in such a manner as to equalize the delays of any card-to-card transmission. Routing signals this way also serves to improve the switching behavior of traditional I/O drivers because the interface now looks and behaves more like a lumped capacitive load.
This technique, however, isn't without its disadvantages. Chief among them is the need for equal trace lengths between multiple system cards, which results in a very large number of card-to-card interconnects. This significantly adds to backplane routing and manufacturing complexity.
It would be possible to produce higher system speeds using PECL or other reduced-swing differential signal technologies. But, the GTLP family makes for a nice case study because traditional and source-synchronous products are available.
In order to do a complete analysis, typical values for maximum clock skew, minimum and maximum flight time, crosstalk, and multiple output switching (MOS) events have been included too. The numbers shown don't necessarily represent state-of-the-art examples, but they are reasonable placeholders with which to do this type of analysis (Fig. 2).
Robust design techniques, either traditional or source-synchronous, demand that the designer work all potential system variables to their extreme values. By pushing and stretching these design variables, a 3D space is created that defines the solution boundaries. For example, the setup margin calculation includes maximum clock skew, maximum interface IC propagation delay, maximum MOS and crosstalk signal effects, maximum card-to-card flight time, and minimum interface IC setup time.
Using the worst-case numbers will define the maximum possible data rate for this particular system. Changing any of the variables changes the system constraints too, and therefore, the maximum data rate. Engineering the system to work at the desired data throughput requires balancing all of the above constraints. Over-engineering costs time-to-market and resources, while under-engineering the system often leads to expensive failures in the field.