Running machine-learning (ML) neural networks at the edge has two prerequisites: high performance and low-power requirements. Deep Vision’s ARA-1 polymorphic dataflow architecture is designed to meet those needs. It typically requires less than 2 W of power while delivering impressive, low-latency operations by minimizing data movement within the chip. It targets applications on the edge, where all of the processing is done locally instead of shipping data to the cloud for processing.
The dataflow architecture is built around multiple neural cores (see figure) linked to an explicitly managed memory and a hardware task manager. The neural cores run a custom instruction set that’s designed for data reuse to minimize data movement.
The architecture is fully programmable, although most developers will use Deep Vision’s compiler to map their models to the system. The compiler detects multiple dataflow patterns in each layer of an ML model and maps those to the cores. A Tensor Traversal Engine coordinates the chip resources to optimize system utilization.
The ARA-1 is a general-purpose platform, but it’s optimized for vision applications. In particular, Deep Vision targets camera-based ML. One use case is automotive in-cabin monitoring. Another is smart retail, such as food stores that track inventory.
The Deep Vision compiler supports the popular frameworks. This includes TensorFlow, Caffe2, PyTorch, and Mxnet, as well as interchanges like ONNX. The toolset includes a bit-accurate simulator, a profiler with layer-wise statistics, and a power utilization optimizer.
The chip is available alone or in different form factors. These include a USB device or an M.2 module with a single ARA-1. A U.2 PCI Express module is available with up to four ARA-1 processors on-board. The U.2 module is hot-swappable and designed for use in edge servers.