THE LITTLE ENGINES THAT COULD
The dynamic instruction set processor from GateChange Technologies employs an array of 32-by-32 pipelined reconfigurable processing elements. With this processor, designers can dynamically tailor the architecture and instruction resources by creating optimal-length instruction words. The words are part of a virtual instruction set that's added to the instruction set of the on-chip ARM7TDMI controller. The virtual instructions can be of any word width, from a single bit to thousands of bits.
Each processing element is a small arithmetic unit that can perform an 8-bit logic operation or a 4-bit multiplication. To support the 32-by-32 array of processing elements, 32 blocks of SRAM (each 2 kwords by 8 bits) provide the local data storage for the computations. A test chip based on the architecture, the 2KL1024, implements the dynamic instruction set and the full 32-by-32 processor array. Four high-speed serial I/O ports supply additional data-transfer interfaces. Some of the applications in the line of sight for the dynamic instruction set processor include large database searches, compares, matching, or various security applications (e.g., encryption or decryption). Also targeted are biometric applications such as fingerprint, hand, or palmprint recognition, video processing, and so on.
Another attempt at highly configurable signal processing is the PicoArray from PicoChip. The scalable, multiprocessor baseband IC integrates hundreds of processing elements into a single array that can deliver a throughput of 30 GMACs. The PicoArray PC101 combines an array of 16-bit processors, each with its own arithmetic units, processing elements, and both program and data memories. The processors are programmed individually during device initialization. The company estimates that each 16-bit processor has control capability close to that of an ARM9 CPU and DSP performance close to that of a TI C54xx series device.
Although the PicoArray is reconfigurable, it's not meant for applications that require cycle-by-cycle updates. Rather, it's intended for applications in which a reconfiguration request may take place every few hours or days. The company developed extensive code libraries that handle many communications functions.
Two additional processors, one from ChipWrights and the other from Morphics, are more fixed-architecture vector engines. The ChipWrights approach employs eight parallel datapaths and a central serial datapath, as well as a four-bank on-chip memory that's interleaved on a 32-bit basis and shared between the various datapaths.
The eight parallel datapaths implement vector operations, and they all perform the same operation on different data (SIMD). Unlike traditional vector architectures, however, each datapath has its own register file. Thus, each can be envisioned as operating by itself. Then, programmers don't have to think in parallel to use the engines. Rather, they can just concentrate on one datapath at a time.
The Morphics approach uses a programmable distributed dataflow architecture optimized for 3G baseband processing. Though it achieves a high throughput, its fixed architecture limits the flexibility. The first chip from the company performs all baseband receive and transmit channel processing required on a channel card between the digital antenna interface and the channel codec function, for up to 64 mobile phone lines. A control processor is used alongside the 3G-BP chip on the channel card. It performs the network termination and hosts the layer 1 software that manages the processing resources on the 3G-BP.
See associated web-only figure
THE LITTLE ENGINES THAT COULD
The dynamic instruction set processor from GateChange Technologies employs an array of 32-by-32 pipelined reconfigurable processing elements. With this processor, designers can dynamically tailor the architecture and instruction resources by creating optimal-length instruction words. The words are part of a virtual instruction set that's added to the instruction set of the on-chip ARM7TDMI controller. The virtual instructions can be of any word width, from a single bit to thousands of bits.
Each processing element is a small arithmetic unit that can perform an 8-bit logic operation or a 4-bit multiplication. To support the 32-by-32 array of processing elements, 32 blocks of SRAM (each 2 kwords by 8 bits) provide the local data storage for the computations. A test chip based on the architecture, the 2KL1024, implements the dynamic instruction set and the full 32-by-32 processor array. Four high-speed serial I/O ports supply additional data-transfer interfaces. Some of the applications in the line of sight for the dynamic instruction set processor include large database searches, compares, matching, or various security applications (e.g., encryption or decryption). Also targeted are biometric applications such as fingerprint, hand, or palmprint recognition, video processing, and so on.
Another attempt at highly configurable signal processing is the PicoArray from PicoChip. The scalable, multiprocessor baseband IC integrates hundreds of processing elements into a single array that can deliver a throughput of 30 GMACs. The PicoArray PC101 combines an array of 16-bit processors, each with its own arithmetic units, processing elements, and both program and data memories. The processors are programmed individually during device initialization. The company estimates that each 16-bit processor has control capability close to that of an ARM9 CPU and DSP performance close to that of a TI C54xx series device.
Although the PicoArray is reconfigurable, it's not meant for applications that require cycle-by-cycle updates. Rather, it's intended for applications in which a reconfiguration request may take place every few hours or days. The company developed extensive code libraries that handle many communications functions.
Two additional processors, one from ChipWrights and the other from Morphics, are more fixed-architecture vector engines. The ChipWrights approach employs eight parallel datapaths and a central serial datapath, as well as a four-bank on-chip memory that's interleaved on a 32-bit basis and shared between the various datapaths.
The eight parallel datapaths implement vector operations, and they all perform the same operation on different data (SIMD). Unlike traditional vector architectures, however, each datapath has its own register file. Thus, each can be envisioned as operating by itself. Then, programmers don't have to think in parallel to use the engines. Rather, they can just concentrate on one datapath at a time.
The Morphics approach uses a programmable distributed dataflow architecture optimized for 3G baseband processing. Though it achieves a high throughput, its fixed architecture limits the flexibility. The first chip from the company performs all baseband receive and transmit channel processing required on a channel card between the digital antenna interface and the channel codec function, for up to 64 mobile phone lines. A control processor is used alongside the 3G-BP chip on the channel card. It performs the network termination and hosts the layer 1 software that manages the processing resources on the 3G-BP.
See associated web-only figure