The XMM register set gets expanded to 256 bits
Intel's SSEx (Streaming SIMD Extensions) started in 1999 with the Pentium III and have since grown to the latest incarnation dubbed AVX for Advanced Vector Extensions. AVX is expected in Intel's upcoming Sandy Bridge processor family. AVX is only one of a number of improvements for the upcoming 32nm chips but it addresses a number of significant issues including AES encryption.
The width of the XMM (eXtended MultiMedia) register set has also increased from 128- to 256-bits. AVX is designed to handle 512-bit and 1024-bit registers. The number of registers remains fixed at 8. This allows backwards compatibility and limits the impact on applications, compilers and operating systems.
The vector instruction set architecture (ISA) is relatively indepedent of the CPU's normal ISA. SSE2 includng many new instructions including a set of cache control instructions designed to minimize cache pollution. SSE3 added DSP and 3D operations. It also implemented floating point to integer conversion without the need to change the global rounding mode. SSE4 incorported a number of packing and packed arithmetic instructions for advanced text processing as well as a CRC32 instruction. And now there is AVX.
The AVX enhancements include a number of instructions that do not target multimedia applications directly. For example, the AES support is for encryption and authentication that will make a significant impact in communication applications. Likewise, the PCMULQDQ instruction performs a carry-less multiplication. It is designed for advanced block cipher encryption. Other new instructions include broadcast permute, and a fused-multiply-add. AVX now includes hundreds of instructions with mnemonics only a compiler writer can love.
The register size increase provides support for 256-bit floating point numbers. This provides better performance as well as better overall power efficiency. It also support 3 and 4 operands. Often the XMM0 register is used implicitly. Different versions of AVX will be designated by the register width. The new incarnations are referred to as AVX-128 and AVX-256. Existing assembler instruction mnemonics that utilized the extended registers will be prefixed with VEX. As with SSE, they are upward compatible.
Intel's AVX-256 ups the ante for vector processing in general, multimedia processing specifically and now encryption. It will have a major impact on high performance compute applications as well as improving performance on the desktop and servers in general. Intel compilers will have support for AVX and most third party compilers will follow suite as they have in the past. Most will have updated versions prior to general availability of new hardware that is expected later this year.