A new tool has arrived that fully automates the process of synthesizing highly optimized pipelines. To augment its existing Volare synthesis environment, Get2Chip Inc., San Jose, Calif., has created Pipeline Master. This tool completely automates the optimization of all critical parameters for pipelines (Fig. 1).
In the design of processing engines, pipelining is considered a crucial technique that can greatly increase the unit's throughput. Pipelines are especially useful for brute-force calculations that repeat over and over. In fact, pipelining can even be applied to nonprogrammable engines, such as a hard-coded encryption engine. Whether or not the calculation engine being pipelined is programmable, pipe-lining lets a circuit process multiple sets of data in parallel, but staggered by one or more clock cycles.
Historically, pipeline design has been a manual and time-consuming process. Working in handwritten code at the register-transfer level (RTL), designers have had to design, evaluate, and redesign while analyzing their results for each of the multiple tradeoffs required.
For example, one might design a pipeline that's perfectly optimized for the area it consumes in silicon. Adding another variable to the equation, like data rate, would necessitate starting from scratch and recoding all over again. Some designers try a "spreadsheet" form of analysis, but such analyses typically don't model all sources of timing.
Moreover, even a redesign of an existing pipeline for a different silicon process will require a fresh approach and a new cycle of hand coding. Changes in process geometries are unpredictable as to how physical effects will scale when you move your device, say, from a 0.15- to a 0.13-µm process. There has been little choice but to start over, re-evaluating all of the various tradeoffs that one must make in a pipeline design.
Pipeline Master ac-counts for all elements in what Get2Chip calls the "pipeline implementation space." Elements include the number of stages in the pipeline (which translates into the pipeline's latency), how operations are assigned to each stage, the input data rate, and the clock frequency. All of these parameters are considered by the tool concurrently as it automatically implements many different versions of the design. In the process, the tool searches for the right mix of attributes to deliver the best possible throughput.
Each version of the pipeline implemented by the tool is referred to as a "transformation." It's derived from highly accurate timing calculations using process-accurate models. The versions are produced by interweaving high-level and logic syntheses. Transformations are automatic explorations of different structures. For example, the tool might try an extra adder or memory port(s) for a given pipeline scheme to decrease design latency by one or more clock cycles.
As transformations are arrived at, the tool examines elements of its performance, such as wire delays and multiplexers, on a physical level. The results of its analyses are fed back into the engine, which makes the tradeoff decisions. Delays aren't looked at on an integer level, but rather in objective fashion. In other words, the delays are objectively quantified. So the tool continually and automatically evaluates and re-evaluates the results of various implementations, much as the designer might have done manually, but at much greater speed.
From the user's perspective, these machinations aren't a particularly visible part of the process. Visible to the user is an interface, from which he or she can set constraints manually, if so desired. For example, the user can set a particular clock speed or number of stages. The end result is a pipeline scheme that meets the user's constraints and achieves timing closure on the first pass. This is guaranteed by the tool's low-level implementation of the pipeline.
Through pipelining operations and random logic, as well as Get2Chip's patent-pending transformation of a pipeline into the optimal number of stages, designers are assured higher-quality results and productivity. Support exists for flushing, or off-loading, of data in the pipeline and stalling, or pausing, of data processing.
In addition, performance optimization is achieved by splitting operations or logic across stages. This balances the delays through stages for greater throughput. "Being able to split operations allows us to take, say, a 32-bit multiplier, and rather than stuff it into one stage of the pipe-line, we can break its operation up across multiple stages," says Steve Carlson, Get2Chip's director of marketing. "The result is a higher clock rate and more control over those balanced delays."
An important aspect of pipeline optimization is how the pipeline interacts with memories. Pipeline Master supports abstract de-scriptions of these in-teractions and deals directly with memory operations within the pipeline.
Even as the tool supports abstract models of memories, it also automates the implementation details. For instance, it can implement access to memory in one or two clock cycles for most memory types.
Complex timing requirements are involved, which the memories themselves dictate. The tool handles these requirements. Automatic management of pipeline hazards, such as read/write conflicts, is provided. Also accounted for are dependencies of reads and writes to the same locations, address overlaps for reads followed by writes, and operations that are out of order.
Note that all transformations performed by Pipeline Master are checked against the design rules of the process being used. Items like the maximum fanout or maximum transition time on the input to a pin are considered. Design rules are derived directly from the process library. In this way, the tool anticipates design-rule checking and ensures that pipeline implementations will stay within design-rule checking (DRC) limits. This aspect of the tool is yet another benefit of its forays into low-level implementation as it evaluates transformations.
Another notable characteristic of Pipeline Master's results is that they're inherently "designed for test." For one thing, the tool's accurate analysis and optimization delivers better-quality results. "You're using different kinds of register elements when you're doing the design for test," says Carlson. Those elements are scan elements; the timing, area, and power characteristics of these versions of generic registers are sufficiently different that the optimization needs to take into consideration the fact that cells of this type will be used. "We can automatically configure the design with the appropriate elements interconnected correctly, such that the timing, area, and power analyses are all correct," he adds.
A handy feature of the tool is its ability to create verification-friendly, latency-accurate models from an unpiped source. An unpiped source would be a functional model of a design that's not bus-cycle accurate. Writing testbenches for an untimed model won't work at either RTL or gate level, so the designer would have to rewrite the testbenches after synthesis. With Pipeline Master, users can start with an unpiped source. After pipelining the model, the tool will write a new model that's bus-cycle accurate, permitting the designer to write testbenches based on that model for use and reuse throughout the design refinement process.
An Insightful View Some insight into the tool's functionality can be gained by looking at its graphical user interface (Fig. 2). In this example, the upper right-hand window, labeled HDL (hardware description language), shows a Verilog source-code fragment with a FOR loop that includes one clock edge, @posedge ck. In the upper center, the Data Flow Graph window, labeled DFG, illustrates the flow of that loop after high-level synthesis. In this example, the tool implemented a multistage pipeline with banks of registers, indicated by rectangles, between the stages.The window at the lower left, labeled FSM, provides a finite state-machine diagram of the code. State "s20" processes the conditional IF and branches to state "s19" when the condition isn't met. In the exit branch, the pipeline is flushed.
At the lower right is the Schematics window, which portrays the code's block diagram. Comparator "comp_uns1" is highlighted in yellow by dint of its selection in the HDL window. The "comp_uns_A28" component is highlighted in red because the cursor is pointing at that object.
Lastly, the Sharing window at the upper left shows the distribution of hardware resources. The function "f" is shared twice in the pipeline's two stages.
Together, these various views of the design let users visualize it in both abstracted and detailed terms. They give users an analysis basis for manual control of pipeline parameters, aiding in both initial design and subsequent debugging.
The views are all linked and cross correlated, with the HDL source-code window being the prime control window. The various views can't be edited directly, thereby preventing the creation of incorrect or infeasible designs through manipulation. Changes must be made in the source code, from which the tool will generate only feasible designs.
Behind the tool stands a rich command set that lets users capitalize on the tool's analysis capabilities. "You don't always want to give the tool 100% control," explains Carlson. The command set is a way to implement manual control of the tool's output.
In the overall Volare synthesis environment, Pipeline Master serves as a specialization of high-level synthesis that solves a particularly thorny, increasingly important problem. The Volare environment automatically recognizes where Pipeline Master can and should be used, and invokes it automatically. In applications that range from microprocessor design to signal processing, telecommunications circuits, encryption, speech recognition, and graphics, the tool takes on pipeline synthesis one pipeline at a time. The tool's output is fed in through Volare's RTL and logic synthesis engines, and reoptimized in the context of the overall design.
Price & Availability
Pipeline Master is sold separately or as an option to the Volare design environment. Separate pricing for Pipeline Master is $25,000, while Volare costs $100,000. Pipeline Master began full volume shipment on December 3. Hardware platforms include Sun and HP workstations, plus PC (under Linux).
Get2Chip Inc., 2107 North First St., Suite 350, San Jose, CA 95131; (408) 501-9600; fax (408) 501-9610; www.get2chip.com.