Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Engineering Feature]
Games Flourish In A Parallel Universe
Multicore processors accelerate games if developers can take advantage of the features and live with the limitations.

William Wong  |   ED Online ID #15745  |   June 21, 2007


Gaming platforms like Microsoft's Xbox 360 (see figure) and Sony's PlayStation 3 (see figure) push the envelope when it comes to graphics and computation, delivering sophisticated and realistic games. With their latest multicore 64-bit processing architectures, programmers can create sophisticated, multithreaded applications.

The computational processors are tightly integrated with the graphical processing units, minimizing system response time for a better gaming experience. Even small delays can disrupt the flow of a game or its multimedia presentation. Performance and balance on both the hardware and software fronts will provide an optimal gaming experience.

Gamers tend to grade a system on the basis of the game's playing capabilities, regardless of how well it takes advantage of the underlying hardware. Still, looking under the hood shows each system's potential. As with most programming platforms, applications rarely take full advantage of the hardware the first time around. It takes time to learn about system idiosyncrasies and to mold application frameworks to exploit the hardware.

Game developers have an additional challenge because game vendors often target multiple platforms with the same game. Obviously, this is desirable from a vendor's perspective, because it widens the market. Unfortunately, even slight differences in platforms or their capabilities can significantly impact the software.

The differences between Microsoft's and Sony's platforms are quite substantial, so a seemingly minor problem potentially becomes major. The Xbox 360 uses a more conventional symmetrical processing (SMP) architecture. Sony's PlayStation 3 is built on IBM's Cell processor. The Cell foregoes the large caches for its eight Synergistic Processing Elements (SPEs), forcing application programmers to use software-based caching support.

THE SYMMETRICAL APPROACH
Microsoft developed a multicore chip, with IBM, based on the Power architecture (Fig. 1). Its three 3.2-GHz processing cores are identical and have their own 32-kbyte L1 instruction and data caches. The two-way, set-associative caches include parity error checking on the 128-bit lines.

Each core can run two threads. The processing cores share a 1-Mbyte L2 cache, but this core has an interesting architecture. Half of the cache runs at the processors' clock frequency, while the rest of the L2 cache runs at 1.6 GHz. Then, things become interesting when adding a new instruction called Extended Data Cache Block Touch.

The instruction is designed to prefetch data from main memory into the L1 cache. It's often easier to take advantage of this instruction in a gaming environment, where the size and use of data is well-defined. Moving data into the cache reduces L2 thrashing, so it can be used to quickly build up a thread's working set. In a conventional processor, the working set is brought in incrementally, slowing down the overall thread operation.

The processing chip accesses main memory through the front-side bus connected to the graphics chip. The front-side bus runs at 5.4 GHz with a bandwidth of 21.6 Gbytes/s. The graphics chip provides a unified memory system to the onchip graphics processing unit (GPU) and the Power cores in the processing chip. The GPU can read data directly from the L2 cache for even better interaction with application code.

The processors also support cacheable and cache-inhibited store operations, which are handled by different pipelines. The cacheable operations use eight store-gathering, nonsequential buffers per core, while the non-cacheable operations use four sequential buffers. By understanding these instructions, developers can optimize their applications.

For example, data written to main memory for use by the GPU will often benefit from bypassing the cache if the application threads no longer need to access this data. Running the data through the cache would simply flush data that might be useful later. However, the cache isn't the only concern for software developers.

Each processing core includes a VMX128 (Vector/SIMD Multimedia eXtension) unit. The VMX128 was specifically designed to accelerate 3D graphics and game physics. Developers can benefit from this feature because it was built on the VMX accelerator, which is already found in many Power architecture cores like those in Apple's G4 and G5 Power Macs. Enhancing SIMD support in a compiler is a relatively straightforward process and typically allows a programmer to exploit the underlying hardware without significantly modifying the software.

There are significant advantages to Microsoft's more conventional gaming hardware approach. SMP with multilevel, transparent coherent caches is standard fare on PCs. Thus, it's significantly easier to develop multithreaded applications that will run on different platforms, often with minimal application architectural changes other than recompilation. The same is true for utilization of VMX 128, since this support is often hidden by the compiler.


<-- prev. page     [1] 2 3     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • Engineers Rely On Internet For Product Info
  • Rochester Electronics Establishes New Design and Technology Group
  • Custom Sources Light Way To 22-nm IC Lithography
  • In EDA, A Year Of Mergers, Failed And Otherwise
  • Software Turns Scopes Into Vector RF Signal Analyzers
  • Couple’s $15 Million Gift Advances Rice Engineering Education
  • November 7, 2008
  • Startup Sets Sail For Speedier Spice Simulation
    1) Ten Top Design Skills For Tough Times
    (6873 views today)
    2) Energy Harvester Perpetually Powers WIreless Sensors
    (521 views today)
    3) Ultracapacitors Branch Out Into Wider Markets
    (512 views today)
    4) Technology Has Been Very Good To Obama, And He Plans To Reciprocate
    (351 views today)
    5) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (297 views today)
    ALL TOP 20



    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources