Multimodal AI Solution Combines Vision and Voice Technologies

July 28, 2021

Renesas and Syntiant develop AI solution that enables low-power contactless operation for image processing in vision AI-based IoT and edge systems.

Alix Paultre

A voice-controlled multimodal AI solution for low-power contactless image processing in vision AI-based IoT and edge systems was released by Renesas Electronics and Syntiant. The solution is intended to accelerate the development of systems like self-checkout machines, security cameras, video conference systems, and smart appliances such as robotic cleaning devices. Delivering advanced voice and image processing capabilities, the solution combines the Renesas RZ/V Series vision AI MPU with the low-power multimodal, multi-feature Syntiant NDP120 Neural Decision Processor. Features include always-on functionality and quick voice-triggered activation from standby mode.

While user-defined voice cues drive activation and system operation, vision AI recognition tracks operator behavior, controls operation, or issues a warning when suspicious actions are detected. The multimodal architecture creates contactless user experiences for vision AI-based systems, and a dedicated, power-efficient chip for voice recognition reduces standby power consumption while speeding up system development.

“We anticipate that demand for multimodal systems that use multiple streams of input information – both image and voice – will increase moving forward as a way to improve both ease of use and safety,” said Hiroto Nitta, Senior Vice President and Head of SoC Business in the IoT and Infrastructure Business Unit at Renesas. “Through the collaboration between Renesas, a leader in low-power image AI technology, and Syntiant, a leader in voice AI technology, we will accelerate the adoption of low-power, ultra-small smart voice AI technology in embedded systems and deliver new combined solutions to customers globally.”

“Voice-based user interfaces will make it possible for customers to deliver new user experiences that bring the next generation of innovative ideas from concept to reality, said Syntiant CEO Kurt Busch. “We’ve already shipped more than 15 million of our deep learning NDPs globally to enable always-on voice in a wide variety of consumer and industrial IoT applications. Our collaboration with Renesas delivers a powerful, low-power voice and image solution that is certain to accelerate traction among a global customer base in a variety of devices and use cases.”

The Renesas RZ/V Series MPU uses the DRP-AI (Dynamically Reconfigurable Processor-AI) accelerator with a power efficiency that eliminates the need for heat management such as heat sinks or cooling fans, reducing the bill of materials cost. Packaged with the Syntiant Core 2 neural network inference engine, the NDP120 can also run multiple applications simultaneously with power consumption as low as 1mW.

About the Author

Alix Paultre | Editor-at-Large, Electronic Design

An Army veteran, Alix Paultre was a signals intelligence soldier on the East/West German border in the early ‘80s, and eventually wound up helping launch and run a publication on consumer electronics for the US military stationed in Europe. Alix first began in this industry in 1998 at Electronic Products magazine, and since then has worked for a variety of publications in the embedded electronic engineering space. Alix currently lives in Wiesbaden, Germany.

Also check out his YouTube watch-collecting channel, Talking Timepieces.