At the 45th Design Automation Conference in June 2008, the Open SystemC Initiative (OSCI) announced the ratification of the TLM-2.0 standard, enabling interoperability for transaction-level models (TLMs). This pivotal announcement for the electronic design automation (EDA) industry marked the beginning of an era of interoperable SystemC-based virtual platforms for embedded software development, verification, and architectural exploration. During the draft proposal process earlier in the year, OSCI received and incorporated feedback from major companies representing the chip, software-development, and design-tool communities. The next steps after ratification were to formalize a TLM-2.0 language reference manual (LRM) and eventually contribute the LRM to IEEE for further standardization once complete.

Arguably, the magnitude of this standardization can be compared to the introduction of Verilog in the late 1980s, which hastened the demise of proprietary hardware-description languages (HDLs) such as HiLo, DABL, UDL/I, and others. Today, the new SystemC TLM-2.0 specification finally includes the transaction abstractions that enable all virtual platform components to communicate with each other and interoperate. The ultimate result will be the obsolescence of proprietary languages and techniques as a means to assemble virtual platforms for early embedded-software development, verification, and architecture exploration.

Since the late 1990s, the importance of software to the overall success of chip-development projects has been highly debated. Case in point: in IEEE Spectrum’s 1999 EDA trends article, Wilf Corrigan, then the executive officer of LSI Logic Corp., was quoted as saying that the most pressing need for new EDA tools is a better methodology “to allow software developers to begin software verification more near the same time that chip developers begin to verify the hardware.”

Since 2007, Synopsys and IBS Inc. have confirmed the importance of software development over the course of 12 customer projects with a variety of characteristics (Fig. 1). The overall software-development effort was found to average 45% of the total effort, and in extreme cases reached 62%. Another notable fact about this project analysis was that hardware intellectual-property (IP) reuse was shown to be well over 50%, and even the reuse of software reached almost 40%.

When diving into more detail on these projects, other frequently debated issues were also resolved. First, RTL verification, including qualification of IP, dominated hardware development both in terms of overall effort and in elapsed time. Second, software development dominated projects overall—again, both in overall effort and elapsed time. Third, the actual elapsed time required to develop the software was longer than that needed to get from the hardware requirements stage to tapeout.

As a result of these findings, and with confirmation of the increased importance of software, the focus has now shifted to determining a) how to optimize software development so that it starts as early as possible in the project process on representations of the hardware under development, and b) how to increase software development productivity.

With respect to starting software development earlier, project data makes it clear that relying on RTL-dependent methods to enable software development will not be enough (Fig. 2 a, b). Getting to verified RTL alone makes up over 50% of the elapsed time from requirements to tapeout of the hardware. Adding in the elapsed time for RTL development and efforts on IP qualification, stable RTL is often achieved after almost 60% to 70% of the time to tapeout has progressed. This still leaves plenty of time before actual silicon becomes available, but demand is high for methods that allow hardware-dependent software development even earlier.

Development productivity surveys typically point to “debug” as the biggest issue plaguing hardware/software development. As a result, enhanced insights into the hardware and software debugging process are required.

As the above data shows, a purely sequential approach-–developing software after the hardware becomes available—is a good recipe for missing market windows and reducing profits. The development of software takes almost as long as that of hardware, and a sequential flow almost doubles the development time. Once RTL is available, hardware-assisted techniques like emulation and FPGA prototyping can be employed. While these techniques allow bring-up before silicon becomes available, their dependence on RTL delays their availability until after 70% of the time to tapeout has already gone by.

The effective solution to overcoming the time-to-availability issues—virtual platforms—has been extant since the late 1990s. Virtual platforms are fully functional software representations of a hardware design that encompass a single- or multi-core system-on-a-chip (SoC), peripheral devices, I/O, and the user interface. A virtual platform runs on a general-purpose PC or workstation and is detailed enough to execute unmodified production code, including drivers, the operating system (OS), and applications at reasonable simulation speed. Users have articulated the need for virtual platforms not to be slower than one-tenth of real time to be effective for embedded software development. The achievable simulation speed depends on the level of model abstraction, which also determines the platform’s accuracy.

Virtual platforms that suffice for early driver development can be made available within four to six weeks of availability of a stable hardware architecture specification. Even at the beginning of a SoC design flow, the system architecture can be defined to a level of detail that enables an unambiguous executable specification. Virtual platforms can be used to develop and integrate the software, which can in turn be used to refine the hardware architecture in an iterative process. It’s sufficient to allow a “just-in-time” delivery in phases, i.e., starting with an instruction-accurate version and delivering timed versions later.

When virtual platforms first emerged in the late 1990s, there were as yet no modeling standards. Thus, the pioneers in virtual platform technology—AXYS Design Automation, VaST Systems Technology, Virtio Corp., and Virtutech—had to develop proprietary modeling solutions. At the time, EDA companies such as CoWare, Synopsys, Mentor Graphics, and Cadence were focused less on enabling software development and more on architecture exploration and verification. This was directly reflected in the early days of SystemC and limited its applicability, as it failed to provide virtual platforms that were fast enough for embedded software development.

In 2006, early adopters of virtual platforms had turned to proprietary offerings; these stalwarts were willing to sacrifice flexibility, model interoperability, and standards compliance to get a working solution. But the mainstream adoption of virtual platforms was stymied by the incompatibilities between TLMs generated by these various proprietary offerings. These incompatibilities had to be resolved to make sure that a user’s investment in modeling was paying off.

In early 2007, Synopsys donated key technologies to the OSCI TLM working group. The resulting TLM-2.0 API standard, focusing on the interoperability of TLMs, now offers standard techniques for model interfaces. This includes a standard transaction payload, temporal decoupling, direct memory interfaces, and timing annotation, which introduces the loosely timed (LT) and approximately timed (AT) modeling styles to enable various levels of accuracy.

Specifically, the blocking interface for LT modeling supports temporal decoupling. Models can block simulation to execute their functionality, or they may return immediately (i.e., not block simulation) with an optional annotation of timing. The LT modeling application programming interface (API) was designed for ease of use and doesn’t require a backward path. For virtual platforms, an estimated 90% of all cases can be dealt with by using a combination of immediate return and timing annotation.

The non-blocking interface for AT modeling supports multiple phases and timing points using an extensible scheme. It requires a backward path. The SystemC TLM-2.0 API also introduces sockets that support backward- and forward-path TLM communication, and keeps LT and AT modeling interoperable via shared mechanisms for sockets, payloads, and extensions.

Consider the example of a simplified diagram of the wireless design chain (Fig. 3). It consists of hardware intellectual-property (IP) providers, semiconductor vendors, and system integrators. All of them interact with software suppliers of IP and tools. All design-chain partners have specific technical and business concerns when it comes to collaboration and interaction among them.

Hardware IP providers are focused on architecture analysis and feature definition for their components. The main objective is to get “designed in” by semiconductor houses. To achieve that goal, hardware IP providers seek early feedback from their customers and must be able to perform early architecture analysis for hardware/software systems. While traditional hardware-IP providers have focused on hardware only, today they need to ensure early in the design process that their components, such as processors, interact appropriately with the software executing on them, or even provide essential software support.

Semiconductor suppliers have similar concerns—they, too, need to perform architecture analysis of their components and improve the likelihood of being “designed in” by their customers, who are subsystem suppliers. Even more than hardware IP providers, semiconductor suppliers today are expected to take full responsibility for device drivers and OS porting to their chips. As a result, they’re increasingly concerned about delivering software as early as possible. They seek new ways to advance software development to earlier project stages and increase software development productivity.

Finally, system integrators are concerned about software-development productivity, quality, reliability, and cost. They often must perform software integration across multiple, communicating processors in the SoC and require the earliest possible access to software-development environments representing what the supply chain will provide to them.

The needs of the various stakeholders in the design chain can be addressed with three different levels of accuracy: (1) LT modeling with very limited timing; (2) AT modeling, which offers flexibility in trading off accuracy and speed; and (3) cycle-accurate modeling (Fig. 4).

For pre-silicon software development and integration, the LT modeling style offers the right amount of speed to satisfy software developers. Timing accuracy is limited, but the models of the virtualized hardware are fully registered and functionally accurate, enabling the same software image to run on the virtual platform as on the actual target hardware, i.e., binary compatibility.

Beyond actually modeling hardware execution, in the application view, or AV (Fig. 4, again), software function calls—like Open GL high-performance graphics API calls—are intercepted and passed on to the host OS, where they’re executed as native calls. This works well for application development, requiring less-detailed hardware models and abstracting the system at the OS API level.

To achieve higher simulation speed, each process in LT model simulation runs ahead to a simulation boundary called a quantum. The simulation time-stamp advances in multiples of the quantum. Deterministic synchronization between processes requires explicitly coded synchronization. The larger the quantum size chosen, the less overhead of scheduling between processes is required. SystemC offers a pre-defined keeper function for the user-configurable quantum. For more detailed debugging, the quantum can be made smaller, and for higher speed, its value can be increased. Processes can check their local time against the quantum and synchronize, if necessary.

Combined with the LT coding paradigm, the direct memory interface (DMI) avoids the expensive traversing of bus hierarchies when accessing data and instructions from memory. DMI gives initiator modules a direct pointer to memories in a target. For instance, instruction-set simulators (ISSs) can bypass the sockets and transport calls, and have read or write access to the memory by default. Extensions may permit other kinds of accesses to enable security modes. In addition, SystemC defines delay and side-effect-free debug access to memories.

For architecture exploration and real-time software development, the AT modeling level is an appropriate level of abstraction. It offers a flexible tradeoff between accuracy (by way of timing annotation) and speed, and reduces the overall model-creation effort, allowing for a broader and earlier exploration of the design space. In the AT coding style, each process is synchronized with the SystemC scheduler, and delays can be defined accurately or approximately.

In addition to simulation synchronization, SystemC defines a user-extendible base communication protocol for AT modeling. Four standard phases mark the beginning of a request, the end of a request, the beginning of a response, and the end of a response. With the associated timing parameters, the delays for accepting a request, for the actual latency of the target, and for the delay to accept the response can be precisely modeled.

For system verification and timing validation, cycle-accurate models are the most suitable solution. However, they run slowly and are available later in the development phase because their development effort mirrors that of the RTL code used for the hardware development. In the past, cycle-accurate models were often developed as independent C models.

More recently, vendors have realized that it’s simply too costly to develop, verify, and maintain cycle-accurate C models to be economically feasible. They now recommend using automatically created C models from RTL or FPGA prototypes connected to the virtual world using high-speed transaction-level interfaces. The recent divestiture of ARM’s SoC Designer to Carbon Design Systems is an example of that trend.

In addition, the other abstraction levels can still be combined with RTL, resulting in a mixed abstraction-level simulation, to contribute to the overall verification effort. For instance, this allows embedded software to be included in the verification flow, enabling early hardware/software co-verification.

The standardization of the SystemC TLM-2.0 APIs has already had a profound impact on the virtual-platform market. All major industry players have endorsed SystemC TLM-2.0 APIs as the interconnect API best suited to assembling virtual platforms. With models now being truly interoperable, a restructuring has begun of the virtual platform offerings into interoperable libraries, simulators, and authoring environments. The structure of the Synopsys offerings in this space is a good example. The DesignWare System-Level Library, which has more than 100 TLM-2.0 based models, can be used in any SystemC TLM-2.0 simulator. The Innovator SystemC Integrated Development Environment (IDE) focuses on improving authoring and integration productivity. It offers not-yet-standardized capabilities like virtual and real I/O, performance profiling, register configuration, and debug interfaces.

Further restructuring of the tool landscape will likely occur as the connections of system-level design and verification strengthen and transform traditional hardware verification itself. In a recent survey, conducted by Synopsys at DVCon 2009, more than 50% of verification engineers stated that they’re already using software running on embedded processors in their design to verify the surrounding hardware. Verification trends of this nature may potentially trigger development of innovative tools spanning the hardware and software domains.