Embedded designers put microprocessors in everyday
products like cars, phones, cameras, TVs,
music players, and printers, as well as the communications
infrastructure, which the general
public doesn’t get to see. They know how important it is for their
products to work—and work preferably better than their competitors’
products.
But the systems-on-a-chip (SoC) behind them continue to
grow in complexity, making that simple goal harder to achieve,
particularly with the rise of multicore systems. Getting these systems
to work well means giving engineers throughout the design
and test cycle visibility into what their systems are doing. At the
modeling stage, visibility is provided in the modeling tool. Once
you move to a physical implementation, though, the designer must
include specific mechanisms to provide visibility.
Choosing which mechanism to provide should be a direct
response to the needs of the different engineers doing hardware
bring-up, low-level system software, real-time operating-system
(RTOS) and OS porting, application development, system integration,
performance optimization, production test, in-field maintenance,
returns failure analysis, and other functions, which need
to be satisfied. Although their respective tools may handle and
present the data in different ways, they all rely on getting debug
and trace data from the target SoC.
TRADEOFF DECISIONS
The easy answer is to fit everything and give full visibility to everything
happening on-chip in real time. Most processors offer good
debug and trace capabilities (Embedded Trace Macrocells for
ARM processors, PDTrace for MIPS, Nexus trace for PowerPC,
and several DSPs), as do the interconnect fabrics. Also, custom
debug capabilities can be added to custom cores.
These capabilities can be integrated at the system level together
with systems such as the ARM CoreSight Architecture (Fig. 1),
the Infineon TriCore Multi-Core Debug Solution (MCDS), or
the MIPS/FS2 Multi-core Embedded Debug (MED). But the
costs of such debug systems in IP design time or licensing fees,
silicon area, pins, and tools may need strong justification to fit into
tight budgets.
RUN-CONTROL DEBUG
Almost all SoC designs will need to enable basic run-control
debug, where the core can be halted at any instruction or data
access and the system state can be examined and changed if
required. This “traditionally” uses the JTAG port. However,
the number of pins can now be reduced to two (one bidirectional
data pin plus an externally provided clock overlaid
on TMS and TCK) using technology such as the ARM
Serial Wire Debug or Texas Instruments’ spy-bi-wire in the
MSP430.
Where boundary-scan test isn’t employed,
or separate debug and test JTAG ports are
implemented, run-control debug can save
two to five pins (TDO, TDI, nTRST, nSRST,
and RTCLK). Where boundary-scan test is
employed, the redundant pins can be reassigned
when they aren’t in test mode. If there’s
reassignment to pins for a trace port, it won’t
even cast a “test shadow.”
Multicore SoCs that place cores in multiple
clock and power domains (mainly for energy management)
should replace a traditional JTAG daisy chain with a system that can maintain debug communications
between the debug tool and the target,
despite any individual core being powered
down or in sleep mode.
The CoreSight Debug Access Port
(DAP) is an example of a bridge between
the external debug clock and multiple
domains for cores in the SoC (Fig. 2). It
also can maintain debug communications
with any core at the highest frequency supported,
rather than the slowest frequency of
all cores on a JTAG daisy chain.
For designs requiring ultra-fast code
download or access to memory-mapped
peripheral registers while the core is running,
the ASIC designer should connect
a direct memory access (DMA) from the
DAP to the system interconnect so the
debug tool can become a bus master on the
system bus.
For remote debug of in-field products or
large batch testing in which a debug tool
seat per device under test is unrealistic, the
designer can also connect the DAP into a
processor’s peripheral map. This permits
the target resident software to set up its
own debug and trace configurations.
A common criterion for embedded-system
debugging is the ability to debug from
reset and through partial power cycles,
requiring careful design of power domains
and reset signals. Critically, reset of the
debug control register should be separated
from that of the functional (non-debug)
system. Power-down can be handled in
different ways when debugging, such as
ignoring power-down signals or putting
the debug logic in different power domains
that aren’t powered down.
The ability to stop and start all cores
synchronously is extremely valuable for
multicore systems that have inter-process
communication or shared memory. To
ensure that this synchronization is within
a few cycles, a cross-trigger matrix should
be fitted (Fig. 3).
The configuration registers of the crosstrigger
interface enable the developer to
select the required cross-triggering behavior,
e.g., which cores are halted on a breakpoint
hit on another core. If, on the other
hand, the cores have widely separated and
non-interfering tasks, it may be sufficient to
synchronize core stops and starts with the
debug tools.
Inevitably, this will lead to hundreds
of cycles of skid between cores stopping.
The synchronous starting of cores can be
achieved with either a cross-triggering
mechanism or via the test access port (TAP)
controller of each core.
Fitting multiple debug ports, one for each
core, has obvious silicon and pin overheads.
It also leaves the synchronization and power-
down issue to be managed by the tools.
This approach only has merit in completely
different cores with completely different
tool chains, where the re-engineering costs
of sharing a single debug port with a single
JTAG emulator box are substantially higher
than the costs of duplicating debug ports
and debug tool seats.
It may suffice if two separate systems
co-reside on the same piece of silicon, but
debugging both systems simultaneously is
rare. An example might be an MCU plus
a dedicated DSP or data engine, where the
DSP or data engine isn’t reprogrammed by
applications but by a set of fixed functions
developed independently.
Continue to page 2