Multicast, a technique commonly used in networking systems, allows a given processing unit to send a single data stream to multiple destinations at the same time efficiently. Typically, the switch employed as the interconnect backbone implements the actual multicast replication function. Because it’s programmable, it can duplicate a given packet to any given device connected to the switch.
For example, a server can send a video stream to multiple receivers simultaneously with a single transaction. Since the same packet is sent to all eligible endpoints in a given multicast group, the endpoints also need to be aware of, and support, the multicast protocol to take appropriate action for a multicast message.
Although networking and communications systems have been implementing multicast schemes for some time, the concept of multicast is new to PCI Express (PCIe) systems. The PCI SIG, the governing body for the PCI Express Base specification, recently ratified a multicast specification in the form of an Engineering Change Notice (ECN) specifically designed for PCIe. Yet there’s an alternate method for implementing multicast— using the integrated direct-memory-access (DMA) function inside a PCIe switch.
To get a sense of PCIe’s role in multicast, it helps to understand conventional PCI, the precursor to the PCIe standard. PCI was designed to be a bus-based protocol shared among several devices. Busbased protocols have inherent broadcast built in, where data on the bus is seen by all devices. In the case of PCI, broadcast can be implemented when one initiator targets a receiver while other receivers listen in silent mode. This subset of multicast could be implemented using the Special Cycle command defined in the PCI protocol.
Figure 1 illustrates the broadcast mechanism implemented in the PCI bus. The flow process for issuing a broadcast message on the PCI bus is:
• The PCI bus master starts a transaction with the assertion of FRAME#.
• The Special Cycle (broadcast) command is issued in the C/BE\[3:0]# lines.
• All slave devices accept the command and data from the master.
MULTICAST IN A PCIE SWITCH
PCIe switch vendors are beginning to implement the recent multicast ECN in new PCIe switch offerings. Due to the bandwidth and performance benefits, large switches will be among the first to support multicast, while the smaller lane switches will follow. However, a number of applications use smaller PCIe switches that require multicast.
For such applications, the multicast function described in the ECN can be implemented using PCIe switches with integrated DMA, which are now shipping in volume. Using the DMA controller in the PCIe switch offers an efficient and attractive alternative to implementing multicast with devices available now.
IMPLEMENTING MULTICAST USING DMA IN A PCIE SWITCH
A DMA engine is typically used to offload the data transfer from the CPU’s local memory out to devices connected to the other side of the interconnect. Generally, DMA engines reside in endpoints such as a storage or network endpoint. The DMA controllers on these devices are application-specific and can only transfer data between themselves and system memory. A generic DMA engine is also used to transfer a large amount of data sent from one local memory to remote memory.
PCIe continues to be the interconnect of choice in a wide range of applications across multiple industry segments. Integrated DMA in a PCIe switch provides the capability to move large amounts of data from local memory to devices attached to the switch, returning CPU cycles for time-critical applications. This capability for offloading the CPU plays a bigger role in embedded systems running real-time operating systems.
Recently introduced PCIe switches with 16 lanes or fewer that integrate DMA engines are available. The DMA engine in these devices supports four DMA channels, which can be independently programmed and controlled. The DMA engine in the PCIe switch is also very flexible, resulting in a versatile DMA implementation that can be used in a large range of applications.
The DMA engine appears as another function on the upstream port (Fig. 2). This function has a TYPE 0 configuration header, and it follows the standard PCIe driver model. The driver for the DMA engine programs its DMA channels by writing to internal registers in the DMA function.
Using the DMA engine in the switch requires software to construct a multicast descriptor ring. This ring will have the same source address, which points to the same transmit buffer, and a different destination address for each descriptor based on the destination port in the PCIe switch. The number of devices in a given multicast group will determine the number of descriptors in a ring. The DMA engine can optionally generate an interrupt upon completion of the multicast ring.
Although multiple DMA channels are supported, a single channel is enough to support the multicast function. The descriptor ring format for each DMA channel descriptor comprises four doublewords (Fig. 3):
• A destination address
• A source address
• Transfer size
• Control
Continue on Page 2