Cadence’s Deep-Neural-Network Processor Pushes to 3.4 TMACs/W
Cadence extended its machine-learning (ML) offering with the Tensilica DNA 100 deep-neural-network processor (see figure), which incorporates Tensilica DSP support to handle new network layers. It targets end-node application such as autonomous vehicles, robots, drones, surveillance systems, and augmented and virtual reality where neural-network inference systems are being employed.
The Tensilica DNA 100 architecture—scalable from 0.5 to 12 TMACs (trillion multiply-accumulates)—can deliver up to 3.4 TMACs/W. Its sparse compute engine provides high MAC utilization while reducing power requirements. The sparse compute engine support can double, with no pruning, or triple, with pruning, the throughput of the system.
The Tensilica DNA 100’s sparse compute engine provides high MAC utilization while reducing power requirements.
The system shrinks bandwidth requirements via weight and activation value compression. To reduce computation, it only needs to address non-zero MAC computations. Accelerators are provided to handle non-convolution layer support, including pooling and Eltwise operations. The system is programmable, including the DSP support, to handle new software requirements; the architecture layers are customizable. The system is compatible with the Tensilica Instruction Extensions (TIE) and the DNA 100 has its own direct-memory-access (DMA) support.
Larger systems can be built using multiple DNA 100 processors on a chip. These are linked together using a network-on-chip (NoC) configuration; a chip-to-chip (C2C) link can be used to scale across chips.
“Our customers’ neural-network inference needs to span a wide spectrum, both in the magnitude of AI processing and the types of neural networks, and they need one scalable architecture that’s just as effective in low-end IoT applications as it is in automotive applications demanding tens or even hundreds of TMACs,” says Lazaar Louis, senior director of product management and marketing for Tensilica IP at Cadence. “With the DNA 100 processor and our complete AI software platform and strong partner ecosystem, our customers can design products with the high performance and power efficiency required for on-device AI inferencing.”
Software support includes the Tensilica Neural Network Compiler that works with prior Cadence ML platforms. The system includes a network analyzer and quantizer to 8- or 16-bit weights, a network optimizer, a DMA and tile manager, and target-specific library selection. The architecture is also compatible with the Android Neural Network App.