Google Builds Custom Processors for Machine Learning

Electronicdesign Com Sites Electronicdesign com Files Uploads 2016 05 20 Tpu 2 Format

The tensor processing unit built by Google for machine learning applications. Soldered onto a standard circuit board, the TPU slides into the disk-drive slot in Google's servers. (Image courtesy of Google Cloud Platform).

When AlphaGo, Google's artificial intelligence program, defeated champion Go player Lee Sedol earlier this year, everyone praised its advanced software brain. But the program, developed by Google’s DeepMind research team, also had some serious hardware brawn standing behind it.

The program was running on custom accelerators that Google’s hardware engineers had spent years building in secret, the company said. With the new accelerators plugged into AlphaGo's servers, the program could recognize patterns in its vast library of game data faster than it could with standard processors. The increased speed helped AlphaGo make the kind of quick, intuitive judgments that have escaped other computers trying to conquer the game.

The new chip belongs a family of processors known as application-specific integrated circuits (ASICs), which can only be programmed with one specific task in mind. It was built specifically for deep neural networks, a kind of machine learning that combines millions of processors like neurons sparking information across the human brain.

Google said that many of its services are using the new chips, known as tensor processing units (TPUs), which serve as the engine for Google’s machine learning software Tensorflow. Google engineers said that the chip is “orders of magnitude” faster and more efficient than other chips they tried out in their data centers.

“TPU is tailored to machine learning applications, allowing the chip to be tolerant of reduced computational precision, which means it requires fewer transistors per operation,” Norm Jouppi, a distinguished hardware engineer at Google, wrote in a blog post. “Because of this, we can squeeze more operations per second into the silicon.”

Google has been quietly hiring chip designers for years, but this new accelerator is the first evidence of their secret work. The company, which has previously used graphics processors for machine learning, has been running TPUs inside its data centers for over a year, Jouppi said. He added, however, that TPUs only replaced some of their existing chips.

Soldered onto circuit boards, TPU chips slide into the disk-drive slots in Google’s servers. While graphics and computers chips carry out all the same tasks as they normally would, the TPUs handle all the exhaustive calculations required by Google’s machine learning software. The chip was revealed during Google I/O, the company’s annual developer conference.

Electronicdesign Com Sites Electronicdesign com Files Uploads 2015 06 Tpu 1 Format

AlphaGo servers, which have been upgraded with Google's custom tensor processing unit. (Image courtesy of Google).

Google is not the first company to build custom accelerators for machine learning. Other firms with AI platforms on the cloud are tinkering with completely new chip architectures. Nervana Systems, a machine learning startup, is working toward chips that mimic not only the abilities of the human brain, but also its structure.

IBM has built a processor known as TrueNorth on the same principle. The chip could run advanced AI programs like image or speech recognition on very little power and without relying on the cloud. About the size of a postage stamp, TrueNorth contains around 5.4 billion transistors and is capable of 46 billion synaptic operations per watt.

Microsoft and China's Baidu are taking another route for their cloud services, using field-programmable gate arrays (FGPAs) that can be altered based on the application. Still other companies are holding onto the graphics processor unit (GPU), the current gold standard for machine learning programs. Facebook, for instance, recently uncorked the architecture behind its custom GPU accelerators.

However, Google’s ASICs can perform the exhaustive calculations required by machine learning more efficiently than many FPGAs and microprocessors. David Wentzlaff, an electrical engineering professor at Princeton University, has found that ASICs are up to three times faster than FPGAs, which in turn are up to three times faster than microprocessors. His research has also shown that ASICs are vastly more power efficient than microprocessors, providing up to six orders of magnitude better performance per area when running software.

Google believes its new chip is seven years—or roughly three processor generations—ahead of other chips for machine learning. That level of computing could give Google an edge over competitors like Amazon and Facebook, both of which are increasingly merging their services with machine learning.

More than 100 Google programs now have some element of machine learning, Jouppi said. These include the company’s popular StreetView feature within Google Maps, along with its search engine and cloud computing platform. Similar software has also found its way inside new messaging apps, and even Inbox Smart Reply—a service that monitors your inbox and suggests short answers to e-mails.

Sundar Pichai, chief executive of Google, ended the keynote speech at Google I/O with a discussion of the company’s research in robotics and medicine, underlining the special place that machine learning has in Google's future. “As the state-of-the-art capabilities in machine learning and AI progress, we see them becoming very versatile in a wide range of fields,” he said.

Looking for parts? Go to SourceESB.