Www Electronicdesign Com Sites Electronicdesign com Files Dk Multimedia Fig1

How to Scale a Multimedia Design from Voice, Audio, and Video to AI (.PDF Download)

May 29, 2018
How to Scale a Multimedia Design from Voice, Audio, and Video to AI (.PDF Download)

Facial recognition and voice control have landed, and it’s everywhere. Police officers are plucking offenders out of crowds of 60,000 or more; retail stores are enabling their high-definition displays with high-resolution cameras to monitor customers’ facial expressions; and, of course, smartphones are using it for user authentication.

The applications are myriad, yet facial recognition is essentially a form of advanced pattern recognition, which itself is being enabled by neural-network-based, deep-learning algorithms for artificial intelligence (AI). These are all of a class similar to the powerful algorithms used for autonomous vehicles and medical imaging, as well as simpler defect-detection applications on the factory floor or intruder detection in the home.

Those latter two systems are being trained to do more advanced defect detection and analysis on manufactured goods and consumer behavior patterns. Classical computer vision systems simply can’t perform at such levels.

The primary reason deep learning has made such alarming advances is due to the combination of GPUs and artificial neural networks (ANNs) and its variants, such as convolutional neural networks (CNNs). Neural networks try to emulate the human brain but are essentially simple, interconnected processing elements that have multiple weighted inputs and a single output (Fig. 1). That output is fed to another hidden layer, and the process is repeated.

1. Artificial neural networks (ANNs) comprise multiple simple processing elements, each with weighted inputs and a single output. That single output forms the input to multiple elements at the next (hidden) layer. (Source: ViaSat)

In an image-processing example, an image is fed to the input. Subsequently, the first layer could perform edge detection, the second layer could do feature extraction (such as an ear and nose, or a STOP sign, or a type of defect), and the next layer could do Sobel edge detection, followed by contour detection at the next layer, and on it goes, depending on the application.

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!