Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Design View / Design Solution]
Master On-Chip Embedded Multiprocessor Coherence
Although snoopy virtual-bus approaches are the first step, hybrid snoopy-directory schemes will be the next trend in embedded coherence.

Sanjay Vishin  |   ED Online ID #12219  |   March 30, 2006


Deadlock/Livelock
In addition to choosing the method of serialization and type of coherence protocol, cache-coherence protocol designers must guarantee that the protocol is deadlock- and livelock-free, given limited resource/buffer constraints. This is particularly relevant in packet-switched, interconnect-based coherence.

There are two types of deadlocks—interconnect and protocol. Both generally occur due to buffer constraints in a packet-switched interconnect. Protocol deadlocks should be carefully considered when designing coherence protocols (Fig. 3a). Common schemes that prevent deadlocks include separating a transaction's request path from the reply/response path, and guaranteeing that a cache or a memory agent responds to a request in any state.

To accomplish the first scheme, designers usually use virtual channels7 (Fig. 3b). Transactions flowing in any virtual channel follow a FIFO order, and a blocking event in the stream causes a backpressure that can be traced all the way to the source. Hence, as long as the sinks (of transactions) make forward progress, so does the system.

Livelock in a distributed system takes place when there's a halt to forward progress. At the processor, this is reflected in the program counter of some Load/Store not making forward progress. This frequently occurs when multiple caches try unsuccessfully to gain ownership of a cache line. If a global serial order is properly established in the system, each agent can handle requests in that order. The global serial order itself must be established in a fair manner. Various resources (ports, buses, buffers) all need to be fairly allocated to the multiple threads/processors.

Another concern related to livelock prevention is flow control. A system's flow control limits resource allocation. Done in an ad-hoc manner, it could result in livelock. A common case is overuses of retries or negative acknowledgements (NACKs) while responding to a request.

Other Considerations
Beyond deadlock and livelock, designers should consider the following issues:

  • Cache hierarchies and DMA: Issues of deadlock surface as transactions traverse the cache hierarchy. Usually, one can adopt the same mechanism used in the broader protocol to keep the requests and replies on separate (virtual or real) channels/FIFOs.

    Another issue concerns determining the level at which to enforce coherence (L1 Cache, L2 Cache, or L3 Cache). Where will the I/O enter/extract cache lines from the coherence domain? Solutions that involve issues of inclusion are usually very application- or system-specific. Hints can be supplied to the coherence system for prefetching and data placement by incorporating the hints into the coherence system's transaction set. The obvious example is in routing, where the headers of an incoming IP packet need to be matched against a table to determine the destination buffer/interface for that packet. These headers can be placed close to the lower cache levels by coloring transactions with hints, such as Read/Write Hit/Miss policies.
  • Synchronization and barrier operations: Many ISAs offer various atomic primitives that must be mapped onto the coherence system. LL/SC, for instance, is a common atomic primitive in modern ISAs.4 This form of atomicity is prone to livelock if not implemented correctly, and can lead to deadlock. Weaker memory systems require a safety net, called barriers, to force a certain behavior (usually between Stores and Loads issued from a processor or thread) during sensitive code sequences. This is generally achieved by inserting special barrier instructions supported by the ISA. The coherence system may need to respond to these operations by dynamically stalling certain transactions to support their behavior.

Further Reading:

  1. Hellestrand, G. R., Rapid Design of Software-Rich Chips: Executable Specification -> Realization , white paper, VaST Systems Technology Corp., Oct. 2002
  2. Raza Microelectronics XLR processors, www.razamicroelectronics.com/products/xlr.html
  3. Broadcom BCM1xxx communications processors, www.broadcom.com/products/
    Enterprise-Small-Office/Communications-Processors/BCM1480
  4. Culler, D. E., Singh, J. P., and Gupta, A., Parallel Computer Architecture: A Hardware/Software Approach , Morgan Kaufmann Publishers, 1999
  5. Lenoski, Daniel E., and Weber, Wolf-Dietrich, Scalable Shared-Memory Multiprocessing , Morgan Kaufmann Publishers, 1995
  6. Benini, L., and De Michelli, G., Networks on Chips: "A new SoC paradigm," IEEE Computer , 35(1), 2002
  7. Dally, W., and Towles, B., "Route packets, not wires: On-Chip interconnection networks," Proceedings Of DAC , June 2001
  8. Open Core Protocol Specification version 2.1, www.ocpip.org


<-- prev. page     1 2 [3]     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • Network-On-Chip Tools Arrive for The Masses
  • Tackling System Design Challenges Through Early Verification
  • ESL Tools Take Center Stage As Designers Move Up
  • Parasitic Extraction Tool Targets Next-Generation Custom ICs
  • Synopsys Jumps Into ESL-Synthesis Pool
  • Verify Control Systems Before Committing To Hardware
  • You're Using How Many FPGAs?
  • Tool Up For The FPGA Blitz
    1) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (187 views today)
    2) Hot Hands For Some Cool Rock: Motion Sensing Meets Audio Engineering
    (170 views today)
    3) What's All This Transimpedance Amplifier Stuff, Anyhow? (Part 1)
    (91 views today)
    4) GPS-Derived Grandmaster Clock Delivers Ultra-Precise Time And Frequency Sync
    (90 views today)
    5) Downconverting Mixers Lower Power Consumption While Improving Performance
    (71 views today)
    ALL TOP 20



    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources