Cadence/PCI-SIG (redrawn by William Wong/EBM)
67ee9b04c4259ce0d4678da1 Promo Cadence Pcisig

What’s the Difference Between the Palladium Emulator and an FPGA for PCIe Debug?

April 3, 2025
PCIe validation is easier and faster using the Palladium Emulator versus applying FPGA-based methods.

What you’ll learn:

 

PCI-SIG’s Peripheral Component Interconnect Express Gen5 (PCIe Gen5) is a system protocol used primarily for data transfers at high rates in systems. A transfer rate of 32 Gb/s can be achieved by PCIe Gen5. PCIe is integrated in almost all computer systems, including servers.

PCIe is a complex protocol that includes Link training, TLP generation and transaction, different payload transfer, error TLPs, flow control, and recovery state validation in RC and EP mode. It’s crucial to validate the protocol even before the verification of the entire system gets underway.

Verification engineers are involved in verifying the PCIe protocol through Universal Verification Methodology (UVM) testbenches. Along with verification setup, emulation engineers provide a platform to validate PCIe protocol as well as software development. This article describes the steps required to validate the PCIe Gen5 protocol and the software on the Palladium Emulator.

Emulator Hardware Requirements for PCIe Validation

To validate the PCIe Gen5 protocol on an emulator, the following items are required:

  • PCIe Gen5 RTL/IP
  • Software or OS—it can be loaded into the emulator
  • PCIe Gen5 virtual SpeedBridge

This hardware setup is needed to validate the PCIe Gen5 protocol:

  • Palladium Emulator
  • One Linux InfiniBand (IB) host
  • PCIe Gen5 SpeedBridge or PCIe Gen5 High Density Speed Bridge (HDSB)
  • SSDs that support Gen3, Gen4, or Gen5
  • Base to connect PCIe SpeedBridges and SSDs
  • Environment Development Kit (EDK), a Linux host or PCIe Gen5 Host
  • PPOE and fiber cables

PCIe Configuration and Hardware Setup

Before we begin setting up the hardware, it’s important to know the PCIe configuration. The example here includes one Linux or PCIe Gen5 Host (EDK), and four SSD targets. For Gen5, I’ve taken 16 lanes, which supports a 32-Gb/s data rate. The figure shows the complete setup. This configuration may change based on the design.

In the design, we need an emulation wrapper file that instantiates the PCIe IP design and PCIe Gen5 virtual SpeedBridge. The PCIe IP supports both EP and RC mode. In EP mode, the Linux host (EDK) is the root complex, and the IP is the endpoint; it uses all 16 lanes for data transfer. In RC mode, the IP (Palladium Emulator) is a root complex, and four SSDs are the endpoints.

Therefore, we need five PCIe SpeedBridges instances—one for EP mode (16 lanes) and four for RC mode. Each SpeedBridge instance of RC mode supports four lanes.

The Palladium Emulator contains TPODs required for communication between the emulator and the SpeedBridges. TPODs are ports on emulators that consist of a power port and data port. The power port supplies power to the SpeedBridges connected to it, and fiber cable carries data. Since five SpeedBridges are required for PCIe Host and SSD targets, five TPODs must be enabled.

Palladium Emulation involves two steps: compile and synthesize; and running the job.

For the first step, two modes are available:

  1. In-Circuit Emulation (ICE) mode/Legacy mode in which only the synthesizable design is compiled, synthesized, and loaded into the emulator. The non-synthesizable part, such as UVM testbench/system tasks, runs on IB host only. The IB link is used to drive the transactions into the emulator.
  2. Simulation Acceleration (SA) mode through IXCOM flow in which both non-synthesizable and synthesizable design is compiled, synthesized, and downloaded into the emulator. The non-synthesizable part of the design includes file handling, initial statements, assertions, and system tasks supported by Verilog/System Verilog HDL.

The design must be compiled and synthesized on Palladium, for which the IXCOM flow is used, because it contains non-synthesizable constructs such as file handling and system tasks that are downloaded into the emulator. To run the job, a separate emulation environment is required. Please refer to the Cadence website, as the emulation flow contains Cadence proprietary information.

Debugging the PCIe Design on the Palladium Emulator

Assume the emulation flow is developed and the emulator is ready to use for testing the PCIe IP and the software. The following scenarios describe some of the debugging techniques:

  • Consider the Palladium Emulator is in RC mode and four SSD targets are endpoints. Here, RC is unable to establish a link with any SSD targets. There are two possibilities that cause such a failure. The design itself has bugs, or the software doesn’t generate a proper training sequence. In this case, the Palladium presents a very powerful technique of probing the design signals and generating the waveforms during runtime. The advantage is that there’s no need to resynthesize the design because the entire or partial design can be converted to waves by probing them. If RTL is working fine, then the software must be investigated for the potential issue.
  • If the design works fine and establishes the link with three SSD targets but is unable to do so with the fourth one, then investigate the non-working target and power cycle it. Power cycling means unplugging and re-plugging the power and fiber cable into the failing SpeedBridge. If that doesn’t work, then change the speedbride or the cables and try again. If it’s still a no-go, then Cadence will help debug the issue.
  • Consider the Palladium in endpoint mode and EDK is the host or root complex. Though EDK boots, it establishes the link with the endpoint. It might happen while reading the configuration registers from the endpoint, and EDK didn’t receive the TLP and timed out. This causes the EDK to be stuck, requiring a power cycle to bring the EDK back to normal state. This happens due to RTL having a bug in the endpoint mode, or EDK hanging and failing to receive the TLP transmitted by the endpoint. Or perhaps the software provided incorrect information to the endpoint, which isn’t supported by the design. Again, the signals can be probed without resynthesizing the design and waveforms can be generated to investigate the issue.

Once the bug is discovered in RTL, the design must be resynthesized and implemented on the Palladium to test the fix. The resynthesis usually affects the clock frequency by a few kilohertz unless a big feature is added. Here, the SpeedBridges will ensure data integrity by performing the rate adaptation and the design performance is maintained.

The resynthesis on Palladium doesn’t require meeting timing, applying placer constraints and implementation strategies that consume time. The compilers will determine the operating clock frequency and the SpeedBridge will ensure its performance by rate adaptation. The design can be debugged during runtime and, therefore, debugging on the Palladium can be performed efficiently.

Debugging the PCIe Design on an FPGA-Based Platform

Implementing the PCIe design on an FPGA is beyond the scope of this article. The assumption is that the required hardware, PCIe IP, and software are implemented and the FPGA synthesis and PNR (place and route) flow are also available. So, regarding the debugging aspect of the FPGA flow, the following scenarios describe the debugging technique on FPGA-based platforms:

  • Considering the same example described above, PCIe is unable to enumerate four SSD targets. To debug the issue, the design must be probed to generate waveforms. The signals can be probed by using the Integrated Logic Analyzer (ILA)/Chipscope, which requires preserving the signals to prevent optimization. The design must be synthesized and implemented on an FPGA with extra instrumentation logic required for the ILA. The signals probed into the ILA and the instrumentation logic requires the same clock for it to capture the data. For example, if PCIe signals work at 100M frequency, then ILA logic also requires a 100M clock. Therefore, to probe the signals for debug, the FPGA designer must meet timing between the design and the ILA logic as well. If the signals probed aren’t enough to debug the design, then the FPGA design and implementation cycle continues until the bug is discovered and fixed.
  • Another FPGA prototyping tool that can be used is the Protium emulator, which is an FPGA-based emulator. The PCIe setup on Protium is like Palladium, with few changes, but it’s beyond the scope of this article. To probe the signals using the Protium to generate waveforms, the signals must be preserved and the trigger must be applied in the scripts. It requires resynthesis and re-implementation of the design on the Protium emulator with trigger signals enabled. The instrumentation logic required to probe the signals is automatically implemented by the Protium CAD tools. Therefore, the design and implementation cycle on the Protium continues until the bug is discovered and fixed.

What’s evident is that debugging on FPGA platforms can be a challenging and time-consuming process. The entire design must be implemented to generate waveforms, which involves meeting timing constraints and successfully deploying the design onto the FPGA. Clearly, making FPGA-based debugging is less efficient compared to the Palladium-based approach.

Advanced Emulation is the Best Debug Route

Effective validation and debugging are essential to ensure the reliability and performance of PCIe Gen5 systems, which are integral to high-speed data transfer in modern computing. Platforms like the Palladium provide a streamlined and efficient approach to debugging through runtime signal probing and real-time analysis, significantly reducing development time.

While FPGA-based platforms offer alternative methods for validation, their complexity and time-intensive nature make them less efficient for iterative debugging. By leveraging advanced emulation tools and methodologies, engineers can enhance the design process, identify issues early, and ensure seamless integration of PCIe Gen5 protocols in complex systems.

About the Author

Kunal Ashokkumar Doshi | Emulation Engineer, Microsoft Corporation

Kunal Doshi is a seasoned VLSI Design Engineer specializing in emulation, FPGA prototyping, and static timing analysis. With a strong technical background and expertise in various programming languages and EDA tools, he excels in resolving complex issues and driving business development through innovative solutions.

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!

Sponsored