Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Design View / Design Solution]
Master On-Chip Embedded Multiprocessor Coherence
Although snoopy virtual-bus approaches are the first step, hybrid snoopy-directory schemes will be the next trend in embedded coherence.

Sanjay Vishin  |   ED Online ID #12219  |   March 30, 2006


We only will mention the common way of classifying coherence protocols. This classification is based on the stable states of the caches in the system. The common states are referred to as "MOESI": Modified, Owned, Exclusive clean, Shared clean, and Invalid. The terms are self-explanatory, and details are readily available in textbooks.4

Related to state-based protocol classification is whether the protocol is update- or invalidate-based. In an invalidate-based coherence protocol, the invariant maintained in the system is that only a single owner of a cache line exists in the system. In an update-based system, all copies of the cache line are updated on a Write.

Serialization
Many older symmetric-multiprocessing (SMP) (non-CMP) systems used a bus to broadcast transactions to all agents in the system. Therefore, the agents could "snoop" their state and then take the proper actions to invalidate and update their copies of the data item. The overlap between the different phases of a transaction was minimal and restricted to in-order slip (pipelining).

But for reasons of bandwidth scalability, limitation of speed, and scalability of buses, these rigid snoopy schemes evolved to a couple of newer coherence schemes. At the high end (but still relevant for embedded CMPs, albeit for different reasons), directory-based schemes are common. When there's a low degree of multiprocessing, snoopy "virtual bus" schemes often are the preferred routes.

Snoopy virtual-bus serialization uses specialized higher-performance interconnects, especially in the request phase of a transaction, such as a tree of switches or hierarchical rings (Fig. 2). In these systems, the interconnect is responsible for creating the global serial order while moving from a limiting physical-bus-based interconnect to higher-performance (e.g., serial) point-to-point signaling links.

Directory-based schemes,5 on the other hand, perform the serialization at a new construct called a directory. This directory, which usually resides in the memory module, holds the state of the various cache lines in the system. In general, these systems are a great deal less dependent on the network for serialization and ordering compared to snoopy schemes (virtual or otherwise). Because the number of messages isn't broadcast in directory schemes, they can scale to much larger systems.

Another trend affecting on-chip coherence is that next-generation SoCs (with multiple processors) are following a methodology of separating communications from computation, for reasons of complexity mitigation. This has resulted in design methodologies based on networks-on-a-chip (NoCs),6 and the movement from circuit-switched to packet-switched NoCs.7 Any on-chip coherence scheme needs to heed this important move in deep-submicron SoCs and layer the coherence protocol on a packet-switched substrate.

Embedded SoCs have added issues with cost, low power, real-time operation, intellectual-property (IP) ownership, and possibly heterogeneous processors. Consequently, selecting the coherence scheme is a bit different from their general-purpose counterparts. Low power results in lower system cost, which is a sensitive factor for SoCs. Moreover, if a SoC is used in a mobile application, low power certainly becomes a necessity.

Just as it took a while for caches to break into the DSP world (cycle-accurate processor and system simulators were the key tools that helped accelerate this transition), the same is true for coherence. To port software to a real-time system, a coherence/SoC designer must ensure that a sufficiently cycle-approximate (and fast) simulator is available for the application/middleware port. The problem is a bit more severe in high-performance embedded SoCs, since programmers are exposed to the hardware more than in a general-purpose multiprocessor. In the latter, a restricted set of "system" (middleware, libs, operating system) programmers are exposed to this interface.

IP ownership is a unique feature of embedded SoCs. Most general-purpose CMP vendors' designs don't incorporate any outside IP at the memory-bus level (the level at which coherence is relevant). But outside IP is routine for an embedded-SoC integrator, so much so that even the interconnect (e.g., OCP-IP)8 in many high-performance embedded SoCs is an IP block acquired from an outside IP vendor. Moreover, a high-performance embedded SoC could sometimes benefit from heterogeneous ISA cores sharing the same memory coherently (say, a RISC core and a DSP).

Looking at these trends, the relevance of snoopy virtual-bus coherence schemes to CMPs should be obvious: limited scalability, lots of on-chip bandwidth, point-to-point signaling, less overhead, and low latency. But it's interesting that directory schemes, which are generally considered as applicable only to large server-class machines, are also relevant to embedded SoCs (with possible modification). That's because they can work with unordered interconnects, heterogeneous ISAs, lower-power unicast transactions, etc.

While the first generations of embedded CMPs may opt for just a snoopy virtual-bus scheme, it is predicted that more interesting hybrid snoopy-directory schemes may be the next trend in embedded coherence. That's because designers will come to appreciate the modularity benefits of directory-based schemes.


<-- prev. page     1 [2] 3     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • Network-On-Chip Tools Arrive for The Masses
  • Tackling System Design Challenges Through Early Verification
  • ESL Tools Take Center Stage As Designers Move Up
  • Parasitic Extraction Tool Targets Next-Generation Custom ICs
  • Synopsys Jumps Into ESL-Synthesis Pool
  • Verify Control Systems Before Committing To Hardware
  • You're Using How Many FPGAs?
  • Tool Up For The FPGA Blitz
    1) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (179 views today)
    2) Hot Hands For Some Cool Rock: Motion Sensing Meets Audio Engineering
    (167 views today)
    3) What's All This Transimpedance Amplifier Stuff, Anyhow? (Part 1)
    (71 views today)
    4) GPS-Derived Grandmaster Clock Delivers Ultra-Precise Time And Frequency Sync
    (69 views today)
    5) Downconverting Mixers Lower Power Consumption While Improving Performance
    (55 views today)
    ALL TOP 20



    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources