Can developers utilize a 32-bit architecture with a clear upgrade path even when low power and compact size are high on the list of requirements? Arm Ltd. attempts to answer that question with its Cortex M3 processor, which offers a much smaller footprint and lower power consumption. As a result, it can now compete in areas that were considered out of bounds for higher-performance Arm processors.
By leveraging the popularity of the 32-bit Arm architecture, the company officially moves into the high-performance 8-bit space. Expect the Cortex M3 architecture (see the figure) to be used by companies that already incorporate Arm processors in off-the-shelf microcontrollers as well as custom designs.
The Cortex M3 design is small and fast, but Arm didn't scrimp on the design. In fact, the new architecture includes some twists that higher-end processors may institute in the future. For example, bit banging is common in microcontrollers. Atomic bit operations are also necessary for efficient real-time operating-system (RTOS) support. This new design significantly improves on bit-handling performance.
Of course, squeezing a 32-bit processor into a small package brings a few concessions. Caches disappear, and clock timing is designed to match flash-memory performance with processor-core performance.
MAKING IT SMALLER
Right away, developers will notice that the Cortex M3 lacks the 32-bit Arm instruction set, including single-instruction multiple-data (SIMD) instructions. Instead, the processor supports only the Thumb and Thumb 2 instruction sets. Many Arm processors support these instruction sets as well. They provide access only to the most common operations, but the more compact instructions reduce program size.
Thumb instruction execution in the Cortex M3 is better than the Thumb 2 instructions, especially when the core is run at a speed that requires one wait state. Also, the Cortex M3 retains the 32-bit register set common to all Arm processors. This allows many applications to be ported without changes. Developers using a higher-level language won't have a problem migrating to the new platform.
MAKING IT FASTER
Performance is relative. The Cortex is designed to operate at or near flash-memory performance, which is about 50 MHz. The core can operate at a higher speed if a wait state is introduced. This, along with the lack of a cache, makes the design slower than the higher-performance Arm architectures. However, the performance is high compared to the 8- and 16-bit processors in the Cortex M3's target market. A Harvard architecture and a three-stage pipeline provide single-cycle performance, but with 32-bit operations and registers.
Bit handling was a deficiency in existing Arm architectures. So, the Cortex M3's designers took a unique approach to providing single-bit manipulation. Basically, they designated an address range for use with bit operations. The bus controller then handles all accesses in this range. Get/set instruction pairs are combined into an atomic operation requiring no change to the instruction set.
One feature not found on most microcontrollers is hardware divide, because divides can be very compute intensive. The Cortex M3 can easily handle divides while using lower clock speeds and still consume less power.
Interrupt response time is less critical on a faster processor that can respond more quickly because of a higher clock rate. Arm's designers enhanced interrupt response time using a number of techniques. First, there are 32 vectored interrupts. Second, interrupt addresses are passed directly to the core processor when an interrupt occurs, allowing for early processing of interrupts while the pipeline clears. Third, processor state is automatically saved and restored when an interrupt occurs. Finally, features like tail chaining and preemption enable multiple interrupts to be handled more efficiently.
Tail chaining occurs when two interrupts are being handled. In this case, the lower-priority interrupt will be handled with a minimal transition between the two service routines. Preemption occurs when an interrupt is received as the current interrupt is completing. In this case, the processor switches to the second interrupt without restoring and saving the state for the interrupt task.