MULTICORE AND MULTITHREAD CHIP MULTIPROCESSORS Sun Microsystems soon will offer Niagara2, its next-generation low-power multicore server CPU (Fig. 7). With eight cores running up to eight threads each, it can run up to 64 threads simultaneously. Each core has a dedicated floating point unit and L2 cache.
Based on 65-nm process technology and an improved pipeline, it also will come standard with four integrated memory controllers featuring fully buffered dual-inline memory module (FBDIMM) memory support, two integrated 10/1 Gbit Ethernet ports, an x8 PCI Express port, and substantially improved single-thread performance. Also, Niagara2 will provide integrated cryptographic functions. Each core supports block ciphers and hash functions, including AES, RC4, SHA, MD5, RSA, and elliptic curve cryptography (ECC).
With Niagara2, there's no connection between cores. Also, the only resource competition will involve memory and L2 cache. Each thread is a lightweight process scheduled by the operating system with its own copy of registers. Threads compete for L1 cache, address translation buffers, and the ALU. Applications may be scaled linearly across threads.
Since competing processors use an average of 150 W, Sun engineered the original Niagara to use just 70 W. With data-center power, cooling requirements, and space costs all on the rise, the power savings will be a welcome change.
"It's time the technology industry took a stand. Tripling your data-center performance shouldn't mean tripling your power bill and needing more coalfired power plants. It's becoming more obvious by the day that extreme efficiency is good for the environment and good for business," says Jonathan Schwartz, president and chief operating officer of Sun Microsystems.
"There are proof points everywhere, from hybrid auto companies that can't keep up with demand to fuel-efficient aircraft dominating the marketplace," he continues. "Customers want this same eco-responsibility in their datacenters. Our UltraSPARC T1 systems deliver radical performance improvement without the sticker shock of energy costs associated with IBM's Powerbased systems."
Designers can try out either of the Niagara (UltraSPARC T1) Sun Fire servers free for 60 days. Base-model pricing on the T1000 lists for $3495. The T2000 lists for $9045.
And then, AMD's Opteron is the only X86 chip on the market offering a dual-core 64-bit X86 architecture and integrated Northbridge. The company is expanding this offering, and by next year, it will have an Opteron based on 65-nm process technology with:
True quad-core die
Four 16-bit or eight 8-bit Hyper-Transport links.
Enhanced branch prediction
Out-of-order load execution
Up to 4 double-precision (DP) FLOPS/cycle
Dual 128-bit SSE data flow
Dual 128-bit loads per cycle
Bit-manipulation extensions ( LZCNT/POPCNT)
SSE extensions (EXTRQ/INSERTQ, MOVNTSD/MOVNTSS)
Enhanced Direct Connect Architecture and Northbridge
HT-3 links (Up to 5.2GT/sec)
Enhanced crossbar
DDR2 with migration path to DDR3
FBDIMM when appropriate
Enhanced power management
Enhanced RAS.
Each core also will integrate a dedicated L1 and L2 cache. All cores will share an L3 cache. The dedicated L1 cache will help keep critical data local, suppress latency, and provide a hit rate approaching 95%. The dedicated L2 cache will help eliminate conflicts common in shared caches. The L3 cache will provide optimized memory use for a multicore environment.
The Opteron targets the small and medium-sized server market. It provides the greatest benefits to fast database transactions, support for several ecommerce users, graphic-intensive tools such as CAD and DCC, and processor-intensive financial and scientific tools. Pricing for the Opteron Dual-Core chips ranges from $316 for the Model 265 to $2149 for the Model 885 in quantities of 1000 units.
"The success of the 64-bit dual-core AMD Opteron processor in the server space is due in large measure to innovations in the Northbridge," says Pat Conway, principal member of AMD's technical staff.
"AMD's new processor interface balances system traffic across multiple high-bandwidth HyperTransport ports and the integrated memory controller reduces memory latency," Conway continues. "AMD's Direct Connect Architecture helps lower power and cost by completely eliminating the need for external glue chips like switches and memory controllers."