AVR32 32-bit MCU/DSP - AVR32 UC Core
Redefining Performance for 32-bit Flash Microcontrollers
AVR®32 UC is a 32-bit RISC core with DSP instructions, the second core to be developed based on Atmel® AVR32 architecture, introduced in 2006.
The AVR32 UC core delivers up to 1.3 Dhrystone MIPS/MHz, running from on-chip Flash, outperforming the ARM7-TDMI core by a factor of two for the same code density. Although comparable to ARM® s Cortex-M3 in gate counts, Atmel's AVR32 UC core is the only 32-bit RISC core in this size range to include single-cycle DSP instructions and to deliver higher performance together with a better code size.
First Core to integrate SRAM in the pipeline – The AVR32 UC core is the first core in the industry to integrate single-cycle read/write SRAM with a direct interface to the CPU that bypasses the system bus to achieve faster execution, cycle determinism and lower power consumption. A high-speed bus (HSB) slave interface access allows DMA controllers or other HSB masters to write to or read data directly from the closely coupled SRAM . Arbitration is performed if the CPU and a high speed slave request access simultaneously. The priority scheme is programmable to suit different applications.
The AVR32 UC core includes power management functions, a memory protection unit (MPU), and a 32-bit single-cycle access Flash interface. It also features a 6-level priority interrupt controller including non-maskable interrupt (NMI) with fast event handling, and a three stage pipeline that does not require instruction or data caches, data forwarding, hazard detection or branch prediction.
3-Stage Single-cycle Pipeline – The AVR32 UC core has a three-stage pipeline. The instruction fetch stage has been specially designed to optimize instruction fetch from on-chip Flash memory. The pipeline stage prefetches one 32-bit or two 16-bit instructions every clock cycle into an internal instruction buffer. The buffer ensures that the pipeline completely prevents pipeline stalls during sequential program execution. Execution from on-chip Flash can be sustained at the maximum CPU clock frequency without the CPU having to stall waiting for instructions from the Flash.
The second stage decodes instructions and generates necessary signals for instruction execution.
The third stage is made of three execution sub-units: the ALU, multiplication, and load/store units. The ALU performs arithmetical and logical operations, including hardware division. The multiply unit executes the numerous multiply and multiply-and-accumulate (MAC) operations available from the instruction set architecture (ISA), and the load/store unit performs single cycle memory accesses to SRAM or accesses on the high speed bus (HSB). There are no data hazards in the UC core so the register files can be updated during the same clock cycle as the instruction is executed. This makes assembly programming simpler compared to deeper pipelines as no code scheduling is needed.
Instruction Set Architecture with freely intermixable 16/32-bit instructions – The AVR32 UC core shares the same instruction set architecture (ISA) as its AVR32 AP parent, with over 220 instructions available as 16-bit compact and 32-bit extended instructions. The AVR32 ISA is designed to minimize data transactions between the core and memories, saving both power and clock cycles. T he compiler automatically selects the most efficient compact or extended form of the each instruction giving the user the both the fastest and most efficient code possible.
Load/store Instructions - Load/store instructions are provided for accessing byte (8-bit), half-word (16-bit), word (32-bit) and double word (64-bit) data types. The instructions have multiple addressing modes for efficient access to tables and other data structures. The powerful addressing modes reduce the number of load/store instructions that must be executed (typically 30% of cycles in a conventional processor are used for load/store instructions). For example, t he AVR32 UC “load with extracted index” instruction halves the number of memory accesses in common cryptography algorithms, compared to traditional architectures.
In addition to the regular load and store instructions, the AVR32 ISA has instructions that can modify data read from the register file before storing it to memory, and read from memory before storing it to the register file. On-the-fly data manipulations include load-and-insert bit fields, load-and-swap and store-and-swap. These instructions are well suited to protocol handling and endianess conversions.
Atomic Memory Manipulation Instructions – The AVR32 UC instruction set includes atomic instructions to manipulate mutexes and semaphores, and for general bit-manipulation. Semaphores and mutexes are used by real time operating systems (RTOSs) to prevent resource contention during the execution of a process. Bit manipulation is used to control on-chip peripherals.
DSP Instructions - The AVR32 UC core multiply-accumulate unit executes, in a single cycle, a plethora of multiply and multiply-and-accumulate instructions on standard and fractional numbers, with and without saturation and rounding. Multiply or MAC results can be 32-, 48- or 64-bit wide; 48- and 64-bit results are placed in two registers. DSP instructions also include many add and subtract instructions as well as data formatting instructions such as data shift with saturation and rounding.
Fast Event Handling – The AVR32 UC core event handling system support events like non-maskable interrupt (NMI), exceptions (illegal opcode, bus error), and four interrupt priority levels. Events have different priority levels. Pending events of a higher priority class can preempt ongoing events of lower priority. Upon event detection, the status register and program counter of the current context, plus six general purpose registers, are automatically stored to stack. The first instruction from the event handler is executed within 12 clock cycles, from an autovectored handler address. To contain interrupt latency, multicycle instructions can be aborted by pending interrupts so maximum interrupt latency is limited to 16 clock cycles maximum.
A Flash Microcontroller System – The AVR32 UC core includes on-chip RC oscillator as the main clock source. An on-chip power-on reset and brown-out detector ensure device operation over the guaranteed power supply voltage range, preventing hazardous operations such as spurious Flash write access that could damage the Flash memory content. A hardware watchdog is clocked from the on-chip RC oscillator and detects software hazards from bad program execution or clock defects, such as external oscillator damage.
The AVR32 UC core can operate in privileged or unprivileged mode. The privileged mode is often used for real time operating systems, allowing access to all system resources and using a separate system stack. Unprivileged mode is used for task execution, and limits access to some of the system resources. Safe mechanisms are used to transfer control between the different privilege modes.
A memory protection unit (MPU) controls memory allocation and manages access privileges. The MPU allows restriction of read / write / execute access to different memory areas depending on the privilege mode.
20% Better Code Density than ARM7 (Thumb) or Cortex M3 (Thumb2) – AVR32 UC code is consistently 5% to 20% smaller than code compiled for the ARM Thumb ® instruction set. More significantly, when code is optimized for execution speed, AVR32 UC code is 30% to 50% more compact than code compiled for the ARM instruction set (note1). More compact code increases processor throughput and reduces power consumption by reducing the number of memory accesses.
Note1: According to ARM, code compiled for the Thumb2 instruction set is 26% smaller than ARM code. (An introduction to the Cortex-M3 processor – White paper, ARM )