Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Design View / Design Solution]
Master On-Chip Embedded Multiprocessor Coherence
Although snoopy virtual-bus approaches are the first step, hybrid snoopy-directory schemes will be the next trend in embedded coherence.

Sanjay Vishin  |   ED Online ID #12219  |   March 30, 2006


Without a doubt, embedded systems-on-a-chip (SoCs) are becoming "software-rich,"1 and they're incorporating more and more processors on one chip. The driving forces behind these changes are advances in fabrication technology (propelled by Moore's Law) to address short time-to-market pressures, greater design complexity, and the amortizing of high-cost ASIC fabrication through design reuse.

There's also the economic benefit of higher performance with backward-compatibility to a single-threaded model of computation (the so-called Von Neumann model). That model has long plagued general-purpose computing. Now, such a performance benefit becomes applicable to high-throughput, software-rich embedded SoCs. Examples include high-end set-top boxes, smart phones, automotive media centers, and printer/copier stations.

Current high-end embedded SoCs are mostly heterogeneous. The processors on these SoCs communicate through noncoherent, shared memory using some form of message passing. The classic RISC/DSP combination in a third-generation cell phone communicating through a dual-ported SRAM and interrupts represents a good example of these simple schemes.

When sheer clock-speed scaling ran out of steam, maintaining this single-threaded programming abstraction forced general-purpose uniprocessor designers to resort to dual- or quad-processor coherent systems. The same will happen for these software-rich, high-performance embedded systems—with slight modifications.

Future high-performance SoCs will be hierarchical and heterogeneous systems of processors with coherent clusters of homogeneous multiprocessors embedded in the hierarchy. Some of this transition already has been observed in one specific high-performance embedded market: networking (in the form of coherent network multiprocessors).2,3

The exact nature of future embedded chip multiprocessors (CMPs) is debatable (heterogeneous versus heterogeneous with hierarchical homogeneous processors). But for many of them, shared memory with coherence will be an important issue.

Definition And Basics
A multicore shared-memory system with caches is considered to be cache-coherent if the value returned by any Load (issued by a processor) is always the value of the latest Store to that memory location. To address the ambiguity of the term "latest Store," we're forced to take a small diversion into memory models. We use the help of a common memory model like sequential consistency (SC), where the results of any execution of a parallel program on an SC system make it possible to construct a global serial order of all operations (mainly Loads and Stores) to a location. Then coherence implies:

  • The order of Loads and Stores from each processor appears in the system's global serial order in the same way in which they were issued to the memory system by that processor.
  • The value returned by each read from a processor in the system is the value written by the last write to that location in the global serial order.

Therefore, the term "global serial order" is a product of the memory consistency model (memory model for short) implemented by the system (informally termed Weak, Strong...). The memory model relates to the instruction set architecture (ISA) for single processors, which defines the operational contract between the compiler and the hardware (Fig. 1).

The ISA defines the contract between the programmer and the memory system for a multiprocessor or, more generally speaking, a multithreaded system. Hence, multithreaded languages like Java also have a defined memory model. In this article, most occurrences of multiprocessing can be substituted with multithreading.

SC, total store ordering (TSO), and processor consistency (PC) are some of the common memory models at the machine level (from strong to weak). Stronger implies that more constraints are imposed on the parallel memory-system implementer, which makes the tasks performed by the parallel middleware or system-library writer a bit simpler.

Another way to look at coherence is that it's the weakest form of memory consistency, since it doesn't restrict memory operations any more than what is necessary to provide a reasonable memory system from a single-processor point of view. Informally, stronger models help the programmer by ensuring that a parallel memory system guarantees more than just "Reads return the value from the latest Store." These added guarantees are typically used to form efficient synchronizing constructs between threads or processors.

To achieve coherence, a system must have a few essential properties. For one, Writes to a particular memory location must be serialized at some point in the system. Note that serialization is a logical concept. For some high-performance speculative implementations, it's only a guideline for returning transactions during commit. It's similar to "out-of-order" processors, which maintain a temporary state and an "architectural state" separated by a commit point.

Another property of coherent systems is Write propagation, which implies that a Write needs to eventually propagate to all agents that care about the new value. The third important property (a result of the memory model rather than coherence) is Write atomicity, which implies that a write needs to be propagated in its entirety to all processors in the system after they're serialized.


<-- prev. page     [1] 2 3     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • A New Design Inflection Point
  • Forecasting Industry Growth For 2009 And Beyond
  • EDA Retools To Exploit Multicore Architectures
  • Design And Verification Move Up In Abstraction
  • EDA Retools To Exploit Multicore Architectures
  • A New Design Inflection Point
  • Design And Verification Move Up In Abstraction
  • Challenges Lurk For 22-nm Physical Implementation
    1) 1-A Switching Regulators Operate With 96% Efficiency To Replace Linear Regulators
    (509 views today)
    2) Battery Pack Improves Li-Ion Management For Electric Vehicles
    (303 views today)
    3) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (301 views today)
    4) New Power Approaches May Fuel Analog Job Opportunities In Security And Health Applications
    (298 views today)
    5) Step-Down Switching Regulator Provides 60-V Input Transient Protection
    (151 views today)
    ALL TOP 20



    Reader Comments

    Full of buzzwords but the explanations were very good as are the footnoted references.

    Don Wilde -April 05, 2006   (Article Rating: )

    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources