[Leapfrog: First Look]
In Adding Control-Logic Support, A High-Level Synthesis Tool Goes Full Chip
David Maliniak
ED Online ID #21467
July 23, 2009
Copyright © 2006 Penton Media, Inc., All rights reserved. Printing of this document is for personal use only.
Reprints
High-level synthesis (HLS), or the notion of synthesizing
a design into RTL from a higher level of abstraction,
has been gaining currency among design
teams. For some time now, there have been compelling
reasons to explore HLS methodologies for certain kinds of
designs, or certain blocks within a larger design, such as signal-
processing blocks. Such a design flow can get you to RTL
faster from languages like C++ or SystemC. And because
simulation at the transaction level is orders of magnitude
faster than at RTL, at least theoretically, the RTL you get out
of an HLS tool should be cleaner.
A leading HLS tool, Catapult C from Mentor Graphics, has
been continually improved since its 2004 launch. Initially built
for block-level synthesis to RTL from pure ANSI C++ input, it
has added optimizations for video and wireless designs and
the ability to synthesize multiple blocks. But significant as they
may have been, these improvements pale compared to the
latest overhaul. Mentor has now fully endowed Catapult C
with the ability to synthesize control-logic blocks, enabling it
to synthesize full chips from ANSI C++ to RTL.
The simultaneous HLS of algorithmic and control-logic
blocks has historically been an elusive goal. The two types of blocks have very different properties.
For example, algorithmic blocks synchronize
on data while control blocks
synchronize on clocks. In algorithmic
blocks, arbitration is implicit in the
code sequence. In control blocks, arbitration
is explicitly modeled. Typically,
algorithmic blocks never drop data,
while control blocks are often required
to drop and/or ignore data.
Algorithmic blocks are usually idle
when no data is available for processing.
Control blocks must execute and
update their states even if no data is
available. These disparities between the
algorithmic signal-processing blocks
and control-logic blocks have led to the
development of a number of domainspecific
language styles for coding of
control logic at levels of abstraction
above RTL. Bluespec comes to mind
as an example.
It’s worth pointing out at this juncture
that in Mentor’s view, there are three
different flavors of control logic. According
to Shawn McCloud, Mentor’s
product line director for HLS products,
Catapult has been synthesizing control
logic for years. “We had a philosophy
that we want to be able to infer the
control logic and automatically build it
for as long as possible,” says McCloud.
When it comes to intra-block control,
for example, much of the logic is not
explicitly coded in the C++ source but
can be inferred.
“Say an algorithm is performing a
transformation, such as a fast-Fourier
transform. There’s a sequence of data
through the algorithm. When you synthesize
this and produce the data path,
all of the control logic related to interfacing
with this block can be implicitly
inferred from the C source and built
automatically,” McCloud adds. Catapult
C has been able to synthesize this sort
of intra-block control logic since its
launch in 2004.
In 2006, Catapult C introduced support
for a second variety of control
logic, known as multi-block dataflow
control logic. The idea here is chaining
single blocks to create a higher-level
subsystem. Again, the control logic is
not necessarily modeled in the source
code but is inferred. “This sort of logic
involves the communication channels
between the block and the top-level
finite state machine controller of the
system,” says McCloud. “This can be
very complicated, like, for instance, a
ping-pong memory manager.”
NEW ABILITIES
The leap forward in the latest incarnation
of Catapult C is its ability to
handle a third variety of control logic:
synchronous, reactive inter-block control logic. “This concerns synthesis of
control-centric blocks that are purely
reactive,” says McCloud. With this
kind of control logic, which is explicitly
defined in the C++ source, it’s important
to give designers a way to explicitly
model the control logic. “Now, you can
model a series of ports and make a
decision when there’s a conflict, such
as an arbiter when two requests are
coming in at the same time. The decision
as to which port wins is very much
a user decision,” says McCloud.
In adding synthesis of this sort of
reactive inter-block control logic, the
challenge for Mentor was to determine
how to maintain the abstraction benefits
of C++ while permitting users to
specify lower-level detail. The answer
comes in the form of a new synthesizable
C++ construct for asynchronous
data communication.
The construct lets designers easily
specify asynchronous data communication,
allowing full control of the creation
of concurrent hardware (see the
figure). It enables interfacing of datadriven
algorithms with control-centric
blocks synchronized by clocks.
“We call this a decoupling control
channel,” says McCloud. “The channel
handles data on one end and clocks
on the other, allowing you to connect
between these two abstraction
domains.” With this, designers now
have all the semantics needed to define
what control logic does, including prioritization
of tasks and coordination of
data. It also provides the ability to query
the channel for content availability. All
of this can be coherently modeled in
pure ANSI C++, in a coding style that’s
familiar to hardware designers, who
now can express communication, priority,
and task coordination within an
abstract representation of concurrency.
THE VERIFICATION PIECE
Getting these complex control-logic
blocks from C++ to RTL is one thing.
However, making sure they still function
properly at that level of abstraction
is another. “It’s easy to create RTL
just to lose the benefit of getting there
faster by overcomplicating verification,”
says McCloud.
The C++ representation of a controllogic
block is very different from that
same block at RTL, where there are
pin-level interfaces, memory arrays,
clocks, and so on.
If the block exhibits unexpected
behavior after synthesis to RTL, the
challenge is figuring out why. Mentor
has filed for a patent on a technique for
providing the necessary debug visibility
into these kinds of aberrant behaviors.
The technique involves back-annotation
of the RTL behavior onto the
C++ source code. Designers can thus
execute the C++ source code with the
RTL behavior overlaid, enabling them to
validate detailed RTL block interactions
at the C level.
A final enhancement lies in power
optimization. Many design teams have
adopted clock gating as a power-management
tool, but the insertion of clock
gating is typically a manual process.
In general, the team’s power expert
examines the RTL code to identify
registers that are candidates for clock
gating. It’s a tedious, time-consuming
step. Moreover, it’s pretty easy to overlook
candidates for clock gating.
Because Catapult C synthesizes
the RTL from an untimed description,
the tool can glean knowledge about
the design through detailed sequential
analysis. It uses the result of that analysis
to automate the process of multilevel
clock gating. At register-level granularity,
the tool decides which registers
should be gated before it produces the
RTL. Thus, the RTL it does produce will
include clock gating on all registers that
can benefit from it.
How much clock gating is created
for a given design is very much designand
vector-dependent, explains
McCloud. “We’ve seen anywhere from
10% to 90% power savings,” he says.
On average, power consumption is
reduced by 40%.
The 2009a release of Catapult C
Synthesis is available now. Pricing for
the product ranges from $140,000 to
$390,000 for time-based licenses.
DAVID MALINIAK
MENTOR GRAPHICS
www.mentor.com/products/esl/catapult-c
|