The evolving nature of communications and other audio/video systems demands signal-processing approaches to be scalable and flexible. With scalability, systems can tackle increasingly complex tasks. It also allows new features and additional resources, and improves the task performance. An example is adding more filtering when noise overpowers a signal. Flexibility is required for the system to speedily implement new algorithms.
All signal processing must occur in real time in communications systems like cellular telephones, and audio and video processing. Additionally, changes to the algorithms must be made quickly due to the short-lived market differentiation. Millisecond delays are too long when bits are flying past at tens of thousands to millions per second. Updatable structures need to reconfigure themselves in a single cycle to prevent losing a large block of bits or a dropped connection.
In the past, reconfigurable DSP blocks were implemented on SRAM-based field-programmable gate arrays (FPGAs). But, typically, updating the SRAM configuration cells requires several milliseconds for the megagate-density devices. During those milliseconds, the array isn't usable and all signal-processing activity halts. That's not acceptable when fast operation is necessary.
Chameleon Systems' designers resolved the flexibility and scalability issues with the CS2000 family of reconfigurable communications processors (RCPs). It combines features from multiple product types. Each RCP contains aspects of a DSP chip, a microprocessor, an FPGA, and a custom ASIC. It can't be classified as one or the other. Instead, it forms a new class of producta configurable compute platform. This solution delivers higher system performance than a multichip mi-croprocessor/DSP and FPGA alternative. Programming is easier, shortening the time to market.
Combined in the CS2000 architecture are a 32-bit RISC processor, blocks of embedded memory, a proprietary reconfigurable processing fabric, and a large number of programmable I/O pins (Fig. 1). The RISC processor is based on the ARC core from ARC Cores Ltd. Also on-chip is a PCI v. 2.1-compliant, PCI-interface controller, for interfacing to a host system. And, there's an external-memory controller that connects to a memory bus that's 64 bits wide. The CS2000 offers a DMA subsystem with 16 distributed DMA engines. These enable high-speed data transfers in and out of the reconfigurable processing fabric.
A 128-bit-wide split-transaction bus, dubbed the RoadRunner system bus, provides a time-division multiplexed communications path. This ties the control portion of the chip to the configurable processing fabric. The fabric contains a configurable interconnect structure and repetitive blocks of compute logic known as slices.
The slices are independently configurable. They have user-configurable compute resources in the form of three sub-blocks, referred to as tiles. Every tile has seven 32-bit datapaths, two 16- by 24-bit single-cycle multipliers, four local-storage-memory (LSM) blockseach 128 words deep by 32 bits wideand a control logic unit (Fig. 2).
The configurable interconnect fabric enveloping all the slices is key to the flexibility and real-time performance of the RCP. In a single clock cycle, it can be reconfigured. There's no delay when a new circuit configuration takes on a task from the existing system configuration. Designers employ a configuration plane and a second "shadow" plane to accomplish this. With a simple signal, the shadow bit plane is substituted for the configuration data plane. The old data plane can then be updated in the background. Therefore, while executing its current task, the system prepares for the next potential change.
At Chameleon, designers developed the tools and support structure needed to create the software and port the algorithms to the architecture. The C~Side tools cover the development flow, runtime services, hardware and software debug, and verification aspects of software development. They employ standard C and HDL languages for design entry. Among the tools included is an optimized GNU C compiler for the 32-bit RISC processor. There's also an optimized HDL synthesizer for the reconfigurable processing fabric and a full-chip simulator.
Specially developed firmware solves the challenge of interfacing the 32-bit RISC processor in the control portion of the chip to the reconfigurable fabric. Created by Chameleon, it's called the eConfigurable Basic I/O Services (eBIOS). This software provides a seamless interface, allowing the processor to easily hand off tasks to the processing fabric. The eBIOS performs resource allocation, configuration management, and DMA services. Its calls are generated automatically at compile time, but they can be edited for precise control of any function.
In a typical application, the eBIOS first allocates required fabric resources into one or more slices. Next, the configuration loads into those slices. The eBIOS then synchronizes the local store memories and registers in the datapath units. After the DMA transfers are done, the algorithm executes on the configurable fabric. Finally, the eBIOS manages the return from execution.