Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Technology Report]
Smaller Servers, Larger Performance
Thanks to the latest architectures, designers can pack more processors into less space. But parallel processing may have to wait until the software catches up with the hardware.

William Wong  |   ED Online ID #12839  |   June 29, 2006


It all began in 1952, when the ILLIAC I (Illinois Automatic Computer) graced the stage at the University of Illinois. By 1956, this machine had more compute power than all of Bell Labs— not bad for a 4.5-ton, 10- by 2- by 8.5-ft box filled with more than 2800 vacuum tubes and 64-kword drum storage. Eventually, the infamous ILLIAC IV vector processor incorporated 256 processors in its design.

The latest single-chip, multicore processors run rings around these dinosaurs. But the search for faster, better solutions continues unabated. The industry has made great progress in larger symmetrical multiprocessing (SMP) systems, and typical high-end servers host over two dozen processors. Multiple cores per processor effectively increase the number of processors. Yet moving into the hundreds to thousands range requires a change of architecture.

NUMA, or non-uniform memory access, retains common memory. The node's local memory remains the fastest, while slower access times are incurred as memory is accessed farther from the node. NUMA's big problem, though, is programming.

The NUMA architecture has worked well in AMD's Opteron. Each chip has its own memory interface, but its HyperTransport links can be used to access memory attached to other chips. This works well if the memory being accessed is in an adjacent chip because of the speed of the link. But the approach reverts to a typical NUMA system when hundreds of nodes are used.

A mixture of different application requirements has yielded a plethora of designs. These range from massive supercomputer complexes that are tied together by high-speed fabrics to clusters of blade servers connected by Ethernet.

Compute engines these days are based on standard platforms such as Intel's EM64T and IA64 processors, AMD's Athlon 64 and Opteron, and Sun UltraSparc processors. Similarly, standards like Ethernet, Serial RapidIO (sRIO), and InfiniBand provide the interconnect fabric. And in software, standards are slowly improving the developer's ability to employ these hardware features.

Super Apps
High-performance computing (HPC) tends to cover everything these days, from supercomputing applications like weather prediction and earthquake modeling to clusters of Web servers. Systems like the Cray XT3 use the hypercube architecture to take advantage of dual-core AMD Opteron processors (Fig. 1 and Fig 2). Hypercubes offer scalability, but dataflow and routing become issues that programmers must address.

Different connection architectures like the hypercube have been giving way to fabric interconnects like Ethernet, sRIO, InfiniBand, and ASI (Advanced Switching Interconnect). These standards-based solutions cost less. Also, their performance has improved steadily as products mature.

InfiniBand, one of the most mature of these products, has found a niche in HPC. Some of the largest and fastest supercomputers are based on an InfiniBand interconnect. Mellanox's 480-Gbit/s InfiniScale III switch chip can be found at the center of many of these fabrics (see "Switch-Chip Fuels Third-Generation InfiniBand" at www.electronicdesign.com, ED Online 5999). It can be configured as an eight-port, 30-Gbit/s, 12x InfiniBand switch or as numerous 4x ports. Its low 200-ns latency is critical to efficient HPC applications.

Of course, even InfiniBand can go one better with devices like the Path-Scale 10X-MR PCI Express adapter (see "InfiniBand Hits 10M Messages/s" at ED Online 12359). Its connectionless architecture avoids the queue-pairs used with the usual OpenIB stack, allowing a node to handle up to 10 million messages/s.


<-- prev. page     [1] 2     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • A New Design Inflection Point
  • Forecasting Industry Growth For 2009 And Beyond
  • EDA Retools To Exploit Multicore Architectures
  • Design And Verification Move Up In Abstraction
  • EDA Retools To Exploit Multicore Architectures
  • A New Design Inflection Point
  • Design And Verification Move Up In Abstraction
  • Challenges Lurk For 22-nm Physical Implementation
    1) 1-A Switching Regulators Operate With 96% Efficiency To Replace Linear Regulators
    (539 views today)
    2) Battery Pack Improves Li-Ion Management For Electric Vehicles
    (314 views today)
    3) New Power Approaches May Fuel Analog Job Opportunities In Security And Health Applications
    (306 views today)
    4) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (290 views today)
    5) Step-Down Switching Regulator Provides 60-V Input Transient Protection
    (158 views today)
    ALL TOP 20



    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources