Why is it called the Internet of Things (IoT), anyway? Why isn’t it called the Internet of Everything, since everything is becoming connected? The devices and objects that once were autonomous are becoming more connected to each other, to the Internet, or, more commonly, to both. Every chip and OEM device manufacturer now building components and solutions for the IoT, especially wearable and battery-powered devices, faces a performance and power paradox challenge that is driving the need for a new type of low-power processor.

More data, more sensors, faster responses, more connectivity, and smarter user interfaces all make these devices great to use. But these features all come at a price: more processor performance, more silicon area, more power, and more heat. While general-purpose or standard processors are popular for running applications in deeply embedded systems and subsystems, they typically aren’t optimized for dedicated tasks like those required to support IoT applications.

As a result, achieving the required processor performance can lead to exceeding the power budget of the embedded function, resulting in a bigger package and battery. The paradox happens right at the point where the power requirement for the application exceeds the specifications of the battery, packaging, or both. A new type of low-power processor that is efficient, configurable, and extensible is needed.

Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.

Consider wearable fitness bands. Processors in these devices need to do a lot of controlling, sensing, processing, storing, and interfacing while consuming very little power and area. Using a more standard processor with 1.25 DMIPs/MHz with a maximum CPU speed of 300 MHz while consuming a minimum of ~30 µW/MHz could result in a fitness device that is too large to fit on the wrist, wouldn’t be able to use as many sensors, and would run out of power before the end of the week. Using a more efficient, configurable, and extendable CPU with 1.77 DMIPs/MHz, up to 337 MHz and ~11 µW/MHz, the band could be sleeker in design, utilize more sensors, and still run for more than 10 days without recharging.

Configurability And Extensibility

Processor configurability is very important to achieving the right balance of performance, power, and area in IoT applications. The ability to easily configure the processor by selecting, minimizing, adjusting, or reducing features to tailor its performance for specific application requirements is essential.

For example, selecting and optimizing the number of registers, the type of multiplier, and the number of interrupts and levels enables the core gate count and area to be modified to suit the application performance levels without wasting area and power. Additional adjustments to external bus type, code density options, program counter widths, and divider options enable further processor optimizations. The process should be automated and repeatable to allow simulation of different configurations to refine, improve, and adjust according to device requirements.

Extensibility is also key to designing a processor that supports next-generation IoT applications. It enables designers to add user-defined hardware like arithmetic logic unit (ALU) instructions, condition codes, core and auxiliary registers, and external interface signals to the processor core. By adding user-defined extensions to the processor, a new level of CPU performance efficiency can be achieved.

Energy for the same performance could be reduced by lowering the clock frequency (less dynamic power) for mature technologies or by switching off the processor after it has finished (less leakage) for newer technologies. Or, designers can get more performance for the same energy used, executing more dedicated functions at the same clock frequency. New functionality such as functions with real-time requirements that couldn’t be executed previously without hardware support now can be executed as well. This new level of performance efficiency is not achievable by a more standard CPU that can’t be extended and can only run its standard set of instructions.

Using Processor Extensions To Reduce Power

The Synopsys DesignWare ARC EM4 Processor, with a configurable and extensible 32-bit RISC microarchitecture, was developed to address the power/performance paradox in IoT and other applications. It can be optimized with configurable hardware extensions for a sensor application, for instance, specifically aimed at reducing power or energy consumption.

A typical wearable fitness band monitors metrics such as steps walked or run, heart rate, calories burned, and quality of sleep (Fig. 1). The ARC EM4 Processor would be used to filter and process multiple sensor data and then provide the results to a Bluetooth wireless transceiver. It also manages power and system control functions across the device.

This fitness monitor illustrates the energy reduction achievable by implementing hardware extensions on an ARC EM4 Processor. In this case, the extensions will be in the form of floating-point functions used to process data from multiple sensors, also known as sensor fusion.

Figure 2 depicts two ARC EM4 configurations. Configuration 1 corresponds to an ARC EM4 that does not include any floating-point extensions. Configuration 2 corresponds to an ARC EM4 that includes floating-point hardware from both a standard ARC FPX floating-point extension and other extensions for floating-point divide and square root.

Using a TSMC 90-nm LP library and a clock frequency of 10 MHz, configuration 2 requires only 11% of the cycles to execute the same sensor application as configuration 1 (see the table). This means the algorithm would execute in almost one-tenth of the time using the extensions. The added processor extensions in configuration 2 result in a small increase in area (4.5%) and a small increase in instantaneous dynamic power of the core (7.2%). Although seemingly detrimental, these power and area increases are mitigated by the one-tenth application completion time made possible with the extensions.

Power is defined as the amount of energy consumed per second. To get the energy consumption of the sensor application, the power numbers are multiplied with the execution time of the application. The execution time of the sensor application can be calculated by dividing the measured cycle count by the clock frequency. The power number and the cycle count is better for configuration 2, reducing the total energy consumption of the sensor application by a factor of 9.55x from 1271 nJ to (1271 x 0.106 x 0.988) = 133 nJ (Fig. 3).

Conclusion

The goal of virtually every wearable and portable IoT device is to be able to provide more functionality and processing capability while using minimum power. Achieving this goal gives users a much better experience and maintains or extends the device’s battery life.

This ever-increasing demand for smaller devices with more functionality, longer battery life, and shorter time-to-market has accelerated the need for a new breed of low-power embedded processors and subsystems. Standard and general-purpose processors are less and less suited to the demands of these kinds of applications.

The configurable and extensible ARC EM4 Processor, with its lower power, higher performance efficiency than competing 32-bit processor cores, configurability, and extensibility, is an example of the new breed of processors that is needed to better meet the needs of these demanding applications.

Paul Garden is the product marketing manager for DesignWare ARC Processors at Synopsys. He brings 20 years of experience in the field of CPU processor IP and 8-, 16, and 32-bit microcontrollers. Prior to joining Synopsys, he held a variety of engineering and marketing positions at Microchip, Renesas, and ARM, where he was product marketing manager for the ARM® Cortex™-M3 MCU core and the ARM926EJ-S™ application processor CPU. He holds a bachelor of engineering degree in electronic engineering from the University of Plymouth in the U.K.