Looking back over the past 10 years or so, semiconductor process technology more or less kept pace with the demand for functionality in large-scale processor-based ICs. When the next-generation set-top box IC needed more horsepower, a move from, say, a 180-nm process to 130 nm would provide the necessary boost by adding gates and the ability to run faster clocks. But that next-generation chip would still carry a single processor.
Things have changed dramatically in the last few years. Simply put, silicon scaling no longer meets functionality requirements. Thus, designers turned to multiprocessor architectures, which significantly up the ante in terms of processing power. The number of processors per chip is taking off, already exemplified several years ago by Cisco’s 192-processor engine for its CRS-1 network router (Fig. 1).
With the rise in processing power and complexity comes a host of issues that point largely toward the software side of the system equation. Writing software for a single-processor system is a relatively simple task, as a purely sequential approach will do the trick. But there’s little point in multiple processing engines if you’re not planning to have them execute instructions in parallel. How parallelism is imposed is the crux of the matter. Missteps can result in dire consequences, creating debug nightmares.
Fortunately for those looking to move to multiprocessor architectures in their system-on-a-chip (SoC) designs, tools and methodologies are beginning to appear. Designers can take steps to ensure that their parallelized application code won’t cause memory-access deadlocks, race conditions, or other faults that crash one or more processors or even their entire systems.
HOW MULTICORE LOOKS TODAY
Looking at a generic example of a multicore SoC can illustrate both the complexity of the devices and the programming challenges (Fig. 2). In a hypothetical transition from a 130-nm SoC with a single processor to a multicore implementation at 65 nm, designers would have roughly four times as many transistors to work with.
Multicore architectures ramp up complexity in ways beyond simply having multiple processors. The availability of more gates brings added memory, which is required to handle the increasingly large amounts of data, high-resolution video streams, and other content. The increased bandwidth means more I/Os to deal with all the data. More complex control processing is required by a myriad of network stacks and more elaborate user interfaces. “Designs are using more CPUs,” says Chris Rowen, CEO of Tensilica. “But that has only limited potential because of the way control paths are written.”
When considering multicore SoCs, an important distinction must be made between control-plane and data-plane processing. “In the data plane, there’s strong interest in integrating more functionality,” says Rowen. “Chips no longer process only audio, video, or wireless baseband, but rather they process all of them. Meanwhile, there’s growing complexity in each of these various functions. This puts a lot of pressure on a more programmable solution.”
Efforts to make the most of multiple processors often run aground on the shoals of memory access conflicts. “The old paradigm for multicore designs using shared memory was if things were happening in parallel, you’d want them to touch the memory at different address spaces,” says Limor Fix, general chair of the 45th Design Automation Conference and associate director of Intel Research Pittsburgh.
“The idea is for parallel threads not to interfere with each other, and to minimize the number of clocks required for the shared memory,” says Fix. “If each of the parallel computations is touching a different area in memory, there’s less collision and less locking of the shared memory.”
The problem lies in the fact that visibility into the design is extremely limited. “Typically, when working with RTL simulation models of processors, software debug relies on the general-purpose registers of the processor,” says Jim Kenney, product marketing manager at Mentor Graphics. “These registers are usually exposed at the top level for tracing in the waveform window of a logic simulator.”
Making matters worse is the fact that there may be only one debug port for several processors. With all processors executing instructions concurrently, it’s very difficult to control the speed of any given processor.
Debugging is made even harder due to the absence of determinism. “With multiple processors, you don’t control what’s running,” says Michel Genard, vice president of marketing at Virtutech. Rerunning code often is of no value because the results can be different each time, making bugs hard to pin down. Then there’s the notion of “Heisenbugs,” or changes introduced by probe insertion that alter the system’s behavior.
GOING VIRTUAL
Fortunately, there are ways around these issues, most of which come in the form of “virtualization” or “virtual platform” technology. Many benefits can be derived from virtual platforms (see “Multicore Design Benefits from Virtual Prototyping,” www.electronicdesign.com, ED Online 18637).
Once a virtual platform is assembled from hardware models, many of the issues concerning software debugging are addressed. The designer gains a great deal of control over the system, hence a return to a more deterministic scenario. The system configuration is easily varied in terms of the number and speed of cores as well as the software loads on each.
Virtual hardware offers a good amount of visibility in terms of memory, processor registers, and device states. In addition, when you synchronize the processors, you can synchronize everything at once. It also affords much more control over system execution.
Continue to Page 2