Four Technologies Converge In Hardware Emulation

The EDA industry has evolved into a case study on specialization. This was not true in the early days, when the MDV (Mentor, Daisy, Valid) trio dashed onto the world stage. All three were one-stop shops.

Related Articles

But a decade later, myriad startups populated the semiconductor and electronic system design automation landscape, each of them addressing one stage of the design process with a complex technology aimed at solving one specific problem. In fact, smaller startups concentrated on narrow problems, looking for success by focusing more intensely than the incumbents.

Entering the second decade of the new millennium, the picture is almost a mirror of what it was 20 years ago. The only relevant difference is that “D” and “V” have been replaced by “S” and “C,” and “M” has been relegated to third place, as in SCM (Synopsys, Cadence, Mentor). The landscape continues to be inhabited by a legion of startups driven by the same objective: how to solve one specific design problem with one unique technology.

But there is one exception in this scenario, namely, hardware emulation. Developing a hardware emulation system requires the mastering of multiple disciplines. Unlike all other design automation point tools, hardware emulation still tends to reflect the EDA industry as a whole.

The Four Pillars

1. Four fundamental technologies concur to implement a leading-edge emulation system: hardware development; RTL compilation; run-time control and debug; and emulation VIP creation.

Metaphorically speaking, a hardware emulation system sits on four technological pillars: the hardware, the compiler, the run-time environment, and the supporting verification intellectual properties, or VIPs (Fig. 1). To ensure successful implementation, the emulator design team must be proficient in four critical technological areas:

• Hardware development

• Software development: RTL compilation

• Software development: run-time control and debug

• Emulation VIP creation

The EDA industry addresses all four fields with dedicated tools. But to create a leading-edge emulation system, developers must fine-tune all tools to be parts of one integrated solution. Let’s examine the relevant aspects of the four domains as applied to a modern emulator.

Creating The Hardware

First, let’s clarify that this article focuses on commercial FPGA-based emulation systems. Emulators based on custom silicon, whether processor-based or custom FPGA-based, add a fifth dimension to the four technological fields listed above.

Economy of scale underpins the belief that in the future only commercial FPGA-based emulators will be developed. The cost of designing custom chips at very low technology nodes is becoming prohibitively expensive and will not be justified by the limited size of the hardware emulation market, even though the requirement to do so will continue inexorably into the future.

Creating a universal FPGA platform for emulation that addresses any design type, regardless of size and topology, is a major project, rather more complex than designing an FPGA prototyping board targeting one specific design. In the latter case, the designing team can make calculated tradeoffs to achieve the gates, clocks, and I/Os required for its purpose.

Developing a general-purpose platform that can accommodate a wide range of designs with complexities ranging from few million ASIC-equivalent gates to well over a billion gates, potentially driven by a multitude of complex clock trees fed by tens of asynchronous primary clocks, large banks of complex memories, and extensive I/O connectivity, is much harder.

It is certainly harder than developing a general-purpose FPGA prototyping board. For starters, FPGA prototyping boards comprise a limited number of FPGA devices, where 20 may be an upper limit. Second, there are different performance/flexibility tradeoff requirements. A prototyping board targets higher performance, but typically with less emphasis on debugging and bring-up time.

By contrast, emulation systems must be expandable to encompass several hundred FPGAs. They sacrifice some level of performance for the sake of flexibility and generality. They also must enable users to handle multiple revisions of a design with minimal hand-crafting. Regardless, emulators must still ensure that sufficient signals, gates, clocks, memories, I/Os, and routing options are available to handle a wide range of designs at a non-trivial performance point.

2. Leading-edge emulation hardware includes several components, such as chassis, boards, chips, power supplies, and fans.

Add to that the task of designing the supporting mechanics, chassis and all, with the goal of minimizing power consumption, dimensions, and weight and ensuring high reliability (Fig. 2). All of these factors make the case that a successful hardware emulator must master the art of hardware design.

Getting A Design Onto The Hardware

Once you have the hardware, the next step calls for mapping the design-under-test (DUT) onto the box. The process is known as compilation.

While the development team of the hardware portion of a modern emulator may include few hardware designers, the team designing the compiler may encompass dozens of software designers. Considering what is required, the compiler for an emulator absorbs the largest slice of the R&D personnel budget.

A successful compiler brings together a diverse set of technologies, some more complex than others—register transfer level (RTL) parsing, synthesis, netlist partitioning, timing analysis, clock mapping, memory mapping, board routing, and FPGA placing and routing—just to stick to the main tasks. The requirement is to map the system-on-chip (SoC) RTL into a timing-correct, fully placed and routed system of tens or even hundreds of FPGAs.

Starting with the parsing of the DUT description using any combination of the three popular hardware-description languages (Verilog, VHDL, SystemVerilog), the RTL code is then synthesized into a gate-level netlist. Since a primary requirement for an emulation system is that design turns can be done quickly, synthesis optimizations can be sacrificed in the interest of producing a netlist as quickly as possible. Such a synthesis tool is not what synthesis tool vendors typically do, forcing the compiler team to devise an original development.

Once the design has been synthesized, the netlist must be partitioned across an array of FPGAs to implement the DUT. This is easier said than done. In splitting the netlist, a less than perfect partitioner may assign uneven blocks of logic to one or more FPGAs, causing an abrupt spike in interconnectivity that requires pin multiplexing at factors of 16x or 32x or possibly even higher. The impact on the emulation speed then can be dramatic.

Likewise, a partitioner that does not understand timing may introduce long critical paths on combinational signals by routing them through multiple FPGAs, called hops, which can be detrimental to the max speed of emulation. Here, an accurate timing analysis tool can identify such long critical paths and avoid those hops. This partitioning technology is not common and the tool, then, must be developed from scratch.

The need to map clocks efficiently raises an even greater challenge. Modern designs can use hundreds of thousands of derived clocks distributed over hundreds of FPGAs. Designers reduce power consumption by using complex clock-gating strategies. A disproportionate amount of effort goes into the compiler’s ability to manage these clocks well.

In an emulation system, DUT memories are implemented via memory models that configure the on-board standard memory chips to act as specialized memory devices, such as DDR3 SDRAM and GDDR5 SDRAM. In the context of this article, the creation of memory models falls under the fourth discipline (emulation VIP creation) described later.

After all of this is done—not an easy accomplishment—the FPGAs must be placed-and-routed. Typically, the FPGA vendor supplies the FPGA P&R tool. However, the emulator user ought to be oblivious to this dependency, forcing the compilation developers to encapsulate the P&R tool in an environment that appears native to the compiler.

Compiling a design is a time-consuming process, rather dependent on the design size and complexity. In an attempt to speed up the undertaking, the process is heavily parallelized in multiple threads that can run concurrently on farms of PCs. This parallelization adds another dimension to the already difficult mission of designing a compiler (Fig. 3).

3. A compilation flow for a leading-edge emulation system encompasses numerous stages, from synthesis to partitioning to place

The compiler for an emulator ends up requiring leading-edge synthesis, partitioning, timing analysis, clock mapping, and place-and-route technologies, compounded by the need to be heavily parallelized. It is no wonder that it takes so much attention from the R&D team.

Emulating And Debugging The DUT

Now the DUT is mapped inside the emulation platform and is ready to be emulated. This brings up the third technology leg that has to be tackled, the run-time software. The run-time software development team, smaller than the compiler development team, has to deal with the environment that drives the DUT. It applies the stimulus, collects and processes the response, and handles all the collaterals such as system Verilog assertions (SVAs), monitors, and checkers required to control and debug the DUT. Also, the run-time software embraces two different components.

The first component is tasked to bring the hardware up to life and execute the DUT. It is the combination of the operating system running in the PC and extensive firmware loaded in the emulator. The two in symbiosis manage the I/O operations of the DUT mapped inside the platform and enable the user to start, stop, rewind, loop, single-step, save, and restore—all of the standard moves.

Unlike the forebears of the modern emulators, which were deployed in only one operational mode called in-circuit emulation (ICE), today emulators may be deployed in multiple modes of operations, mainly ICE mode, where physical testbenches drive the DUT, and acceleration mode, where software testbenches drive it.

In ICE mode, the emulator is plugged into a physical target system in place of a yet-to-be-built chip so the whole system can be debugged with live data. The acceleration mode can be further divided in two sub-classes: cycle-based acceleration and transaction-based acceleration.

The run-time software that controls the emulator must support all of the above. It is intimately tied with the operation of the design itself. It deals with any real-time issues or virtualization layers that might be required to make the hardware and processing elements available to the applications that will ultimately be visible to the system user (Fig. 4).

4. A leading-edge emulation system must support multiple modes of execution.

The second component is the debugger. Unlike debugging a design in a logic simulator, where the algorithmic nature of the tool ensures total design visibility and controllability, debugging in an emulation system requires visibility and controllability to be created in the hardware.

Visibility and controllability can be built at compile time via the insertion of probes or of dedicated instrumentation, or they can be pre-built in the fabric of the FPGAs. In all cases, the debugger manages visibility and controllability.

The debugger has to efficiently manage the data capture at run-time. It also must support all modern debugging means such as SVAs, checkers, monitors, and coverage. With an emulator, the user may be dealing with billions of gates’ worth of functionality spanning billions of verification cycles. A more comprehensive debugger is unlikely to be found anywhere else (Fig. 5).

5. SoC debug via a leading-edge emulation system requires a multi-hierarchical level approach.

Creating The Supporting Ecosystem

Multiple aspects are involved in building the ecosystem supporting an emulator. Some of them are related to the design compilation, others to the run-time execution.

The description of the compilation process referred to memory models. The creation of memory models, essential to the deployment of the emulator, is not rocket science. It still adds one more undertaking to the implementation of a successful emulator, though, and requires a specific level of expertise.

Emulators can act as accelerators within a larger verification environment that includes a host running software on a virtual platform. While the hardware in the emulator can provide acceleration of functionality on a precise cycle-by-cycle basis, the virtual platform can accelerate software that doesn’t require that level of accuracy. Still, the two have to talk to each other.

One way to address this necessity is to un-time the interface between the virtual platform and the emulator by inserting a transactor. This piece of high-level VIP abstracts the details of a function or communication protocol into a set of transactions. Obviously, the specifics of each transactor—what transactions are available and how they operate—will differ by protocol.

Transactors include two parts that are placed on either side of the Standard Co-emulation Modeling Interface (SCE-MI) that connects the host to the emulator and ensure that the communication occurs quickly enough to avoid the cancellation of the benefits of acceleration.

A winning deployment of an emulation platform must make transactors available for a wide range of protocols and functions. VIP transactors can be obtained off the shelf. Building them requires the kind of IP design and management skills that an IP company has.

But emulator users may also require custom transactors. This requirement necessitates the development of a tool for taking any abstract behavior and building a transactor for use between the host and emulator. Such a tool is unlikely to be found elsewhere, except in the suite of tools supplied with the emulator.

Broadly speaking, there is a third category of emulation VIPs: speed adapters. They are not software IP. Rather, they are electronic boards that act as FIFOs when the emulator is deployed in ICE mode. Real target systems typically run at rates of hundreds of gigahertz, which is three orders of magnitude faster than the speed of the emulator. The insertion of speed adapters between each of the emulator IOs and the target system ensures that the two clock domains communicate without data losses (Fig. 6).

6. A leading-edge emulation system necessitates broad VIP libraries, from transactors to memory models to speed adapters.

Granted, the designing of speed adapters falls in the hardware design realm, but it requires a rather different design skills than those needed for the development of the emulator. It draws from the IP discipline and necessitates know-how that the designers of the emulator typically do not have.

Conclusions

To create a leading-edge emulation system, the developing team must address four main domains of the design automation know-how and draw from widely disparate technology fields. It also must do so ensuring that all the elements that make up an emulator work together cohesively.

By way of an analogy, think of a symphonic orchestra, where the performers play their instruments—as different as they are—together following a music score that produces a sublime sound touching deeply into the audience’s soul.

For an emulator to be competitive, all of these technologies have to perform flawlessly. There is little room for error. If you don’t do well in all areas, then you don’t do well at all.

Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.

Lauro Rizzatti is the senior director of marketing for emulation at Synopsys. He joined Synopsys via the acquisition of EVE. Prior to the acquisition, he was general manager of EVE-USA and marketing VP. In the past, he held positions in management, product marketing, technical marketing, and engineering. He can be reached at [email protected].