• Channels
Part Inventory
Go
 
powered by:

 
  • Quick Poll
What Social Networking site do you use the most?



VOTE VIEW RESULTS
Previous Polls

Premium Content

New Signal Chain Technical Papers from Texas Instruments:

 

 

 

A New Player In The 32-Bit Procesor Field

The AVR architecture blends 32-bit power with the elegance of its 8-bit brethren.


William Wong

February 02, 2006

Print
Reprints Comment Subscribe

Atmel took a turn away from the pack when it designed its 8-bit AVR. Now, the company is bucking the trend toward the 32-bit ARM architecture with its AVR32 processor architecture. Needless to say, ARM and its partners don't have the 32-bit market sewn up by any means.

In fact, a range of popular 32-bit microprocessors is available, including three families from Freescale alone. The competition should be healthy. Still, the AVR32 melds system design components like DSP and single-instruction/multiple-data (SIMD) instructions, Java bytecode support, and a compact instruction set.

The AVR32 runs in standard and Java bytecode mode. In standard mode, it can execute 16- or 32-bit instructions without switching modes. Most instructions are 16 bits, which reduces code size and effectively increases the cache performance because more instructions can fit into the cache. This is one reason why the ARM Thumb and Thumb2 instruction sets have become so popular. However, the AVR32 doesn't have the mode switch overhead when the need arises for 32-bit instructions.

The AVR32 only requires a seven-stage pipeline (Fig. 1). A short pipeline reduces overhead due to stalls. It also allows for more aggressive analysis because timing constraints aren't as critical as they are with some other system architectures. As a result, features like the dynamic branch prediction can essentially implement zero-cycle-loop instructions, which are key to improving DSP performance.

Thanks to conditional return instructions, there's more inline execution versus the test/branch combination used with other architectures. Individually, the architectural finetuning may seem trivial. But combined, the features add up to greater performance. As a result, a low clock rate can perform the same function on other architectures. Also, lower clock requirements reduce power consumption. This is vital for Atmel's targeted product areas, such as portable multimedia devices.

A REGULAR ARCHITECTURE
Atmel designers kept the system architecture simple. It uses a 16-register register file with a minimal number of mirrored registers for hardware context switching (Fig. 2). The AVR32 also features four levels of interrupt priority. It supports up to 64 interrupt groups and up to 32 interrupt lines per group, and each group has its own priority. This provides a very flexible interrupt control structure.

Interrupt 3, the highest-priority interrupt, mirrors a half-dozen additional registers. This allows many interrupt service routines to run without saving any additional system state. It also enables interrupts to be processed with minimal overhead.

The AVR32 includes a number of common instructions that typically take multiple instructions on other architectures. For example, certain instructions move selected blocks of registers. This is similar to instructions found in the new Texas Insturments MSP430X architecture (see "16-Bit Architecture Grows To 1 Mbyte" at www. elecdesign.com, ED Online 11528). Register-to-register block moves occur in a single cycle.

The AVR32 is a big-endian architecture. But it implements a host of pack and extract operations with a 32-bit barrel shifter that simplify little-endian support. These instructions also come in handy for structure manipulation. The processor can manipulate 64-bit values as well.

The balance of the system architecture is fairly conventional. Data and code caches provide better performance. The paged and segmented memory management unit (MMU) can handle any operating system.

However, Atmel designers still have a few tricks up their sleeves. For example, a four-entry circular buffer can hold return addresses pushed into memory. It allows the values to be used immediately from the buffer instead of being read from memory, delivering better performance. This is transparent to the application and compiler, though applications that use nonstandard returns must explicitly flush the buffer.

The DSP and SIMD sections use a straightforward design with a few interesting tweaks that increase performance, reduce overhead, and get the job done using less power. For instance, there's delayed writeback of the 48-bit temporary accumulator used in a multiply-accumulate mode.

That means each iteration of a loop only needs to load one value instead of the two typically used in other architectures. This can be employed to implement fast finite-impulse-response (FIR) filtering algorithms. The processor supports fractional multiplications with saturation, rounding, and scaling.

Likewise, SIMD support addresses common multimedia algorithms such as MPEG-4 motion compensation. MPEG-4 encoding software also uses instructions to handle operations like the sum of absolute differences. These types of operations are found in competing 32-bit multimedia architectures, but you won't see them in conventional 32-bit architectures.

Average (0 Ratings):

Subscribe
Subscribe to Electronic Design and start receiving more articles like this one
Filed Under:

Check for price and availability on Source ESB:

Go
powered by  
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here
Acceptable Use Policy

Sponsored Links