Smaller transistors and larger die sizes are radically changing the way 64-bit processors are implemented and where they're found. This holds true particularly for high-performance server and desktop computing, but also for embedded applications where 64-bit performance and throughput make many jobs practical.
There's a surprising amount of variety in the 64-bit arena, even taking into consideration the target audience. Some architectures span the space from embedded to giant server clusters (e.g., the PowerPC architecture). Others span the compatibility space (e.g., MIPS Technologies' 32- and 64-bit embedded processor line). Most provide some level of upward compatibility (e.g., AMD's support of the x86 architecture). Then there are those that start a line of their own (e.g., Intel's Explicitly Parallel Instruction Computing, or EPIC, architecture).
Products such as AMD's Opteron and Athlon 64, Intel's Itanium 2, and Sun's UltraSparc III architectures are explicitly designed and marketed for PC systems. Others, like SuperH's SH-5 (see Drill Deeper 7364, "64-Bit IP Provides Embedded Solutions," at www.elecdesign.com), are only available as intellectual property (IP) to be incorporated into custom, embedded applications. We'll look at the architectures that can be found in off-the-shelf products. For example, Broadcom is just one of many vendors providing a number of standard parts that incorporate the MIPS64 architecture.
AND THE WINNER IS... Don't look for a dominant architecture. CISC, RISC, and the new EPIC designs are delivering equally impressive performance and are unlikely to disappear any time soon. Most surprising is that each approach is delivering comparable throughput. This still makes for lots of variance in performance based on what surrounds the core, including the amount and speed of the cache, the type and speed of the bus interface, and the silicon technology.
Compiler technology is also very important to the performance of 64-bit processors (see Drill Deeper 7365, "Compilers Critical To CPU's Success," at www.elecdesign.com). This is especially true when utilizing Intel's EPIC architecture and the SIMD vector support in IBM's AltiVec enhancements to the PowerPC.
A number of common tactics is used in the design of these high-performance processors. The first is large, multiple caches. While it's possible to get a MIPS processor with only a level 1 cache, most systems have at least a level 2 cache. Some, like Intel's Itanium 2, have megabytes of level 3 cache. Cache is key not only for individual execution threads, but also to handle a large number of threads normally found in most application environments. The second common item is on-chip memory controllers. Moving memory closer to the core reduces latency.
This crop of 64-bit processors may not be on top in terms of numbers shipped, but they definitely come out on top when it comes to crunching numbers
THE 64-BIT x86 AMD aims to put a 64-bit processor on the desktop, laptop, and server. The Athlon 64 and Opteron have different names, yet they share a common AMD64 64-bit core that extends the x86 architecture in a fashion similar to past x86 migration, from the 8086 to the 80286. In fact, the 32-bit legacy mode simply makes the processor look like a very fast 32-bit Athlon processor.
The AMD64 doubles the size and number of registers compared to the 32-bit Athlon and Pentium 4. AMD determined that 16 registers were the best combination for high performance, system overhead, and hardware real estate. The 64-bit registers are accessible in native 64-bit mode or in mixed 32/64-bit mode. AMD accomplishes this magic by including only three new instructions. Two are for mode changing, and one is a prefix byte that allows the CISC instruction stream to refer to the 64-bit registers. The average 32-bit instruction length is 3.2 bytes, whereas the 64-bit average only grows to 3.7 bytes.
HyperTransport is central to the AMD64 design. It provides high I/O bandwidth and doubles as a NUMA (non-uniform memory access) SMP (symmetrical multiprocessing) link that makes the creation of multiprocessor systems a snap. Single-processor incarnations have a single, non-cache-coherent HyperTransport link. Multiple processor chips have three cache-coherent HyperTransport links.
As with most 64-bit designs, the AMD64 increases performance through a number of methods such as the use of HyperTransport and a low-latency, on-chip double-data-rate (DDR) memory controller. A superscalar design with a number of execution units helps the AMD64 maintain high code execution performance.
The AMD64 architecture is new. Its success may push others to develop their own 64-bit x86 processors, but that's another story.
AMD ATHLON 64 AND OPTERON
• Target
Servers and PC
• Availability
AMD
• Architecture
CISC
• Operating systems
x86-compatible OS, including Linux, Unix, Windows
• Core
CICS, 6 instructions/cycle with double dispatch operations