The Intel x86 architecture has had a long and sometimes convoluted history. Challenged by the advanced RISC architectures with their very-long-instruction-words (VLIWs), it may be kept out of high-end applications. Still, the x86 architecture has a staying power that not only keeps it around, but allows it to flourish even in new areas, such as low-end embedded applications.
Originally available from a single source, the x86 architecture started as the 16-bit, multiple-clock/instruction 8086. Today, x86-architecture processors are available from multiple sources. Performance-oriented chips use highly pipelined execution units, and the architecture is moving from 32 bits into the 64-bit realm. While the original segmented-memory architecture of the 8086 remains, the flat memory space is the choice of environments for most x86-based operating systems like Windows ME, Windows 2000, and Linux.
All of this growth in the x86 space is due to such companies as Intel Corp., Advanced Micro Devices Inc. (AMD), VIA Technologies Inc., and Transmeta Corp. bringing the architecture to places where it hadn't been before. Intel is pushing the state-of-the-art pipelining to keep performance climbing. In addition, it has merged the x86 architecture with its new 64-bit Itanium processor line. But, this hardware support is more for compatibility rather than a long-term 64-bit plan for the x86 architecture. AMD, on the other hand, has pushed the x86 architecture into the 64-bit realm using new instructions and registers.
Adding new instructions and registers to the x86 architecture isn't a new phenomenon. In fact, the 8086 architecture first advanced along this path with the integration of a floating-point unit. Later advancement came with the addition of single-instruction/multiple-data (SIMD) instructions. These are part of Intel's multimedia extension (MMX) support and AMD's 3DNow! instructions.
The x86 architecture is being pushed and prodded from all sides. Intel's new Pentium 4 line sticks with the 32-bit architecture, but packs it with a 20-stage hyperpipeline to provide in-creased performance. AMD, alternatively, has pushed the x86 architecture into the 64-bit space. It will be interesting to see how this arena takes to the x86 architecture, especially given Intel's preference of combining its 32-bit x86 core with its 64-bit VLIW core in the 64-bit Itanium line.
The VLIW approach has cropped up more than once with the x86 architecture. The Transmeta Crusoe, for example, utilizes a VLIW core to execute x86 instructions, but it does this through a process called code morphing. The approach is significantly different from Intel's Itanium approach.
Power and performance aren't the only areas where innovation with the x86 architecture can be found. Higher integration with peripheral and peripheral-support chips is taking the x86 architecture into the embedded space, which is dominated by a large collection of different processors.
New implementations of the x86 architecture are more compact and use less power than previous versions, making them a real alternative to non-x86 designs. The x86 architecture is showing up in a number of system-on-a-chip (SoC) designs. Many of these provide PC compatibility, incorporating everything from parallel printer port interfaces to Universal Serial Bus (USB) hubs.
VIA Technologies incorporates the Northbridge support with the processor while National Semiconductor's GX1-based products incorporate a two-dimensional (2D) video accelerator. Furthermore, STMicroelectronics' STPC and Rise Technology's SCX501 combine video support with an x86 core. ZF Linux Devices' MachZ provides an interesting twist that lets designers use its cache as conventional memory so that the system can always boot.
Given this variety of x86 implementations, it's best to begin by examining one. Intel's Pentium 4 is a good place to start, as it's the successor to the popular Pentium III.
Hyperlining And Other Magic
Intel's Pentium 4 maintains the logical x86 architecture supported by the Pentium III, but its internal architecture is significantly different from the one in the Pentium III. Called the NetBurst microarchitecture, it's actually much different from most x86 processors (Fig. 1).
The Pentium 4 has a 3.2-Gbyte/s system bus interface that provides access to the external 40-MHz system bus. The Pentium 4 processor speed starts at 1.4 GHz.
Inside the Pentium 4 is an advanced transfer Level-2 (L2) cache, an execution trace cache, and support for streaming single-instruction/multiple-data Extensions 2. This includes 144 new instructions and 128-bit register support, along with rapid execution engines that run at half a clock tick per instruction. It also features enhanced multimedia and floating-point support.
Tying this all together is a rather deep, 20-stage execution pipeline that's twice the length of the P6 pipeline in the Pentium III (Fig. 2). This hyperpipelined technology allows the Pentium 4 to execute more than one instruction per clock cycle.
The speed of the pipeline is such that wire delays must be included. Likewise, some stages must be repeated to handle the amount of traffic in the pipeline and the complexity of the job. This allows designers to fine-tune the pipeline.
MicroOPs flow through the pipe-line. The microOPs are issued based on the x86 instructions that are employed at the beginning of the pipeline, where they are decoded and placed into the execution trace cache. The cache has room for approximately 12k microOPs.