Computer vision tries to replicate human perception and associated brain functions to acquire, process, analyze, understand, and then act on an image. But replicating this process is extremely challenging. Why? Let’s look at an example.
Driving back to work from your lunch break, you have a craving for dessert. As your eyes scan passing businesses, your brain applies filters to isolate a business that sells sweets and your favorite dessert items are retrieved from memory, such as “doughnuts,” “candy,” and “cookies.” Once you see a match to your craving, you pull into the shop to get your treat. As you enter the shop, your sense of smell triggers memories of the dessert that you associate with happiness and you visually select that item.
Designers analyze what hardware and software is required to perform this same task. The seemingly simple concept of isolating an image to identify it has taken years of research and development to accomplish. Today, teams using computer-vision hardware and software algorithms coupled with deep learning are seeing success in identifying objects.
However, as of now, computer-vision systems can’t be pointed at a random object and asked, “what is that?” and have it answer with 100% reliability every time. For example, while road testing a self-driving car in Australia, the computer-vision system could not figure out what a kangaroo was.
1. Tractica reports the actual and predicted computer-vision hardware and software market revenue.