By Premduth Vidyanandan, Xilinx Inc., Longmont, CO, [email protected]
With the increasing size and complexity of FPGA devices, there is a need for more efficient verification methods. Timing simulation can be the most revealing verification method; however, it is often one of the most difficult and time consuming for many designs. Timing simulations that traditionally were measured in the hours, sometimes minutes, using standard desktop computers can now for some projects be measured in multiple days or weeks requiring high-powered 64-bit servers. This cuts into the time-to-market and cost-of-implementation advantages of using FPGAs in the first place.
One of the biggest challenges that FPGA design and verification engineers face today is time and resource constraints. With FPGAs growing in speed, density and complexity, there is a lot of taxation not only on manpower but also on computer processors and available memory to complete a full timing verification. Furthermore there is an escalating challenge for the design and verification engineer (which many times can be the same person) to get proper testing of today’s FPGA designs in shorter timeframes with an increased confidence of first-pass success.
Importance of Timing Simulation
Today’s FPGAs need both functional and timing simulation in order to ensure that designs work and continue to work. FPGA designs are growing in complexity and the traditional verification methodologies are no longer sufficient. In the past, simulation was not an important stage in the FPGA design flow. Currently, however, it is becoming one of the most critical. Timing simulation is especially important when designing with the more advanced FPGAs such as the Virtex-5 FPGA Family from Xilinx.
Traditional FPGA verification methods are:
1. Functional simulation
Functional simulation is a very important part of the verification process, but it should not be the only part. When doing a functional simulation it will only test for the functional capabilities of the RTL design. It does not include any timing information, nor does it take into consideration changes done to the original design due to implementation and optimization
2. Static Timing Analysis / Formal Verification
Most engineers see this as the only analysis needed to verify that the design meets timing. There are a lot of drawbacks to using this as the only timing analysis methodology. Static analysis cannot find any of the problems that can be seen when running a design dynamically. This analysis will only be able to show if the design as a whole can meet setup and hold requirements and generally is only as good as the timing constraints applied. In a real system, dynamic factors can cause timing violations on the FPGA. One example of this would be Block Ram collisions. With the introduction of Dual Port Block Rams in FPGA devices, care should be taken not to read and write to the same location at the same time, as this will result in incorrect data being read back. Static analysis tools will never be able to find this problem. Similarly if there are misconstrued timespecs, static timing analysis will not be able to find this problem.
3. In-System Testing
Virtually every engineer relies on this method as the ultimate test. If the design works on the board and passes the test suites, then it is ready to be released. This is definitely a very good test, but it may not catch all the problems right away. At times the design needs to be run for quite some time before corner-case issues will manifest. Issues like timing violations may not manifest themselves in the same way in all chips. Usually by that time the design is in the end-customer’s hands. This means high costs, downtime and frustration to try to figure out the problem. In order to get proper in-system testing completed, all the hardware hurdles will need to be overcome such as problems with SSO, cross talk and other board related issues. If there are external interfaces that will need to be connected prior to commencement of the in-system testing, this will increase the time to market of the product.
As can be seen from above, the traditional methods of verification are not sufficient for a fully verified system. There is a compelling reason to do dynamic timing analysis.
· Timing Simulation is the only way in which dynamic analysis can be done. Most engineers have compelling reasons for refusing to do timing simulation. Some of the main concerns are:
· It is time consuming
· It takes a lot of memory and processor power to be able to verify
· There is no way to re-use the testbench from functional simulation. New testbenches have to be created
· Debugging the design turns out to be a chore as the whole netlist is flattened and there is no way to single out the problem in a timely manner.
· Timing simulation shows the worst-case numbers. The design has enough slack to not be concerned.
· Not all the sub-modules are coded at the same site. There is no way to split out the parts that are coded at each site, since the designers of these parts will be the ones who understand the design better in order to verify it.
These are valid concerns and that is why the next section will cover what engineers can do to overcome some of these hurdles.
Obtaining Accurate Results Using Netgen for Timing Simulation
Xilinx has come up with a revolutionary method to get Static Timing Analysis numbers and the timing numbers out of Netgen for Dynamic Analysis to match. Running Netgen with the –pcf switch and pointing to a valid PCF file will ensure that the numbers out of the Trce and Netgen will match.
All the new Xilinx architectures take advantage of Relative Minimums for timing calculations. Using relative mins means that you will be using Maximum Clock delay and Minimum Data delay for setup calculations and vice-versa for hold calculations. Current simulators do not support using a number from the MIN field of the SDF and another number from the MAX field of the same SDF file. Due to this limitation Xilinx requires two separate simulations—one for Setup and another for Hold checks.
Netgen writes out the SDF file such that when a SDFMAX simulation is run, the maximum clock delays and the minimum data delays are used. SDFMAX ensures that a design meets the setup requirements for the target device. When a SDFMIN simulation is run, the minimum clock delays and maximum data delays are used. SDFMIN ensures that a design meets the hold requirements for the target device. Improving the Timing Simulation Experience
The commonly used phrase “the whole is greater than the sum of parts” can be reversed with respect to timing simulation to almost say, “the sum of parts is greater than the whole.” This phrase sums up what this section will cover. In order to cut down on the time spent on timing simulation, we will have to rely on the “divide and conquer” method. For one big flattened netlist, any form of verification will be a time consuming and tedious task to complete. Hence the solution is to break the netlist into smaller components.
This methodology is not revolutionary to the digital logic world; it is evolutionary. Ever since HDL has been around, designers have preferred component-based simulation instead of simulation of one big design. The problem is that there was no way to propagate this method to the world of timing simulation. This is no longer the case with the advancements in keeping hierarchy throughout FPGA implementation. The idea behind this was simple. Most of the designs are created from smaller blocks and verification is done on each sub-module.
A method was introduced some time ago called KEEP_HIERARCHY. This solution allows the design to maintain hierarchy even if it goes through implementation. This took a small step towards improving the timing simulation solution, but the real problem it aided in solving was the debugging stage. Now the design is no longer a flattened netlist. The back-annotated HDL file has different pieces of hierarchy that match the original design. Now if a problem is found when doing a timing simulation, it is a lot easier to debug the problem and narrow it down to the source of the issue. As mentioned earlier, this was only a stepping stone to the full capabilities of this feature.
The next step for KEEP_HIERARCHY was the ability to create “Multiple Hierarchical Files.” This is a feature that was introduced into the software tools to be able to write separate netlists for each piece of the hierarchy as well as a corresponding SDF (Standard Delay Format) file. The introduction of this feature opened the door for a variety of methods to use with timing simulation. Once a different file can be written for each piece of the hierarchy, each Timing module looks the same as the RTL version. This enables the ability to reuse the testbenches that were used when doing the functional simulation. This was a big step in timing simulation.
Now engineers no longer need to write a separate testbench just for doing the timing simulation. If a testbench has been written for functional verification, almost no work needs to be done to re-use the exact same testbench for timing simulation. The port names at the top level will always be the same, and this way the testbenches can be re-used. One of the main advantages of this sort of design is that it makes it easier to pinpoint the problem. In order to understand the true power of this feature, we will have to look at a real world example.
Figure 1 - Top Module showing individual Sub Modules
In Figure 1, Sub Module A is created first by Engineering Team 1 while Sub Modules B and C are created by engineering Team 2 and IP Module D is purchased from a third party. These are all created at different times and/or by different engineers, and each module is verified with its own testbench to prove it functions accurately. Once all the individual pieces have been successfully verified, they are assembled for implementation into the FPGA. This is usually how the RTL simulation is done. Now, with the ability to use MHF (Multiple Hierarchical Files) in conjunction with KEEP_HIERARCHY, it is possible to maintain the same strategy even with timing simulation.
Using this feature helps solve two of the biggest dilemmas faced by designers who try to do a timing simulation: 1) the ability to re-use the testbenches for each module and 2) the ability to pinpoint the specific module that is causing the problem. There are multiple ways in which to run a timing simulation. Since the top level ports for each of these modules will remain the same when using MHF, the RTL testbenches can be easily reused.
Having the final netlist in a modular form does enable the user to switch out different modules for their RTL equivalents. By doing this, the user will be able to speed up simulation runtimes. RTL is almost always significantly faster than structural netlists and so, if there is a way to replace the structural code with RTL without impacting the functionality of the design, this should be exercised. Almost no design will work flawlessly as soon as it has been implemented. This is why there is a need for doing a timing simulation.
Using the same example as above, we can take a look at how to improve the speed and the visibility of the complete design. In order to get the smallest runtimes, it is ideal to only run timing simulation on one module at a time. In this case we would run timing simulation on Sub Module A and then have Sub Modules B, C and D in RTL form. Once we run timing simulation and everything works as expected, then each Sub Module can be switched out and tested in the same manner. Using this methodology would also mean that if a problem were found in one of the Sub Modules it would be easy to pinpoint that Sub Module and send it back to the author to fix. If multiple Sub Modules exhibit problems then the added advantage is that two different engineering teams could be working on the problems at the same time.
In the traditional flow, problems found in one part of the design had to be fixed before the designer could look at the other parts of the design. Using the MHF flow prevents the need for this. One of the other major complaints from timing simulation users is that if the other engineering team is out of the country then it makes it really difficult and time consuming to finish the final verification. This is due to the fact that a lot of time is lost as well as the fact that there is a lot of dependency when using traditional timing simulation methodologies. With the MHF methodology, the dependency factor is removed. Using MHF will cut out a lot of idle time by the different engineering teams. This will ensure that the teams are used to their full efficiency. Having a modular structure with the netlists also can aid the verification group as well. In the past what needed to be done by one verification engineer can now be split between groups of verification engineers. The same ideologies that can be applied to the development group can also be applied to the verification group.
In addition to continuous advancements in the world of simulation, there have also been major advancements in methods of applying stimulus as well. Designs used to be extremely small, so an earlier method of stimulating a design was to toggle each signal using force files or simple stimulus at the simulator prompt. With designs getting more complex, the need for a better methodology of applying stimulus arose. This is where the power of VHDL and Verilog and came into play. With the introduction of HDL coding languages, test benches started to get more complex and compact.
In this field, technologies have arisen such as PSL, SystemC and SystemVerilog. Coverage of these languages is beyond the scope of this article. One disadvantage of these coding styles is that it is taxing to use the output of a simulation as the stimulus to another. Some simulators support Extended Value Change Dump Format that allows the user to do exactly that. The main hurdle with users not using this method for timing simulation was that there was no way to use the output as stimulus as the port names will change when everything gets flattened. With MHF this problem goes away, as there are now individual modules for which the stimulus can be applied and so the output of a module can now be used as the stimulus of another module for both RTL as well as timing simulation.
Choosing the Hierarchy
A major part of achieving success in hierarchical simulation is picking the hierarchy. There is no given formula for picking the correct hierarchy. This is why there is no correct hierarchy or wrong hierarchy, although there are some guidelines that can be used when trying to pick a hierarchy. It is always good design practice to ensure the following guidelines are met.
· The design should be fully synchronous.
· All critical paths should be contained within one Logic Group (a piece of the design that can be synthesized separately). Typically, each Logic Group is a module in Verilog or an entity in VHDL that is instantiated in the top-level of the design.
· All IOB (Input/Output Block) logic should be at the top-level. Every input or output of the device should be declared in the top-level as well as I/O buffers and I/O tristates. However, instantiated I/O logic in Logic Group is acceptable.
· Registers should be placed on all of the inputs and/or outputs of each Logic Group. A good design practice is to make all input signals or all output signals registered at the Logic Group boundaries. This ensures that the critical paths inside of a Logic Group are maintained and eliminates possible problems with logic optimization across Logic Group boundaries. This rule should be followed consistently for all hierarchy groups in the design.
· The top-level should contain only instantiated modules or entities, IOB logic, and clock logic (DCMs, BUFGs, etc.).
· Logic Groups should be chosen so that no group is so small that it is trivial or less relevant to verify separately but not so large that it becomes unwieldy to simulate and debug should a problem arise. There is no exact formula for this and can change depending on the design and requirements for the verification.
· Logic Groups should be selected so that the portions of design that are most likely subject to change late in the design flow are isolated from other more stable portions of the design. This allows late design changes to have a lesser effect on verification runtimes when properly selected.
Preserving hierarchy should not affect the performance of the design as long as the above-mentioned guidelines are followed. To obtain maximum benefit from preserving hierarchy, it should only be applied to blocks in the design whose ports are needed to be visible during the gate level simulation. These blocks will typically be the upper-level blocks in the design that follow the guidelines listed above. By limiting the preservation of hierarchy to selected blocks, the synthesis and implementation tools will have more freedom to optimize the design and improve performance. Figure 2 below shows an example of where hierarchy can be preserved in an example design.
Figure 2 - Example of choosing hierarchy for preservation
It should be noted that these are only guidelines. There is no set rule that dictates how hierarchy should be chosen or maintained. It does vary from design to design as well as from user to user. It is up to the user to decide where it makes the most sense for hierarchy to be maintained for verification and where it should be dissolved.
Hierarchical Verification Put to Practice
In order to quantify the possible benefits of taking a hierarchical approach to timing simulation, we will examine two designs, one VHDL and one Verilog, both targeting mid-size Xilinx FPGAs and simulated using the Model Technology ModelSim SE simulator for 500 microseconds. These simulations were run on a dual 2.0 GHz Xeon machine with 2 GB of RDRAM memory running on the Linux operating system. This is a modest attempt to represent the magnitude difference this methodology could represent and is not necessarily representative of typical simulation runtimes or memory requirements.
The VHDL design represents a somewhat typical DSP oriented design targeting a Xilinx Virtex-4 SX35 FPGA. We chose to split this design into nine sub-level pieces and one top-level piece by placing a KEEP_HIERARCHY on each desired sub-section. We choose the most volatile section of code for this test, in that it is changing frequently in this part of the design flow. Performing a relatively simple simulation and comparing the RTL simulation time to that of the timing simulation of the design, we find a significant increase in runtime and memory requirements as shown in Table 1 below. If, however, we take the approach of just performing a timing simulation on the portion of the design that changed, we can reduce the runtime and memory requirements by 24x and 21x respectively. Even if we choose to verify the entire design using only the changed section for timing verification, we see only an approximate doubling of runtime and memory requirements from the full RTL simulation. This also represents a large reduction in these requirements compared to a more traditional full timing simulation.
Type of simulation |
VHDL Design Runtime / |
Verilog Design Runtime / |
Full RTL Simulation |
6.4 minutes / 28.8 MB |
18.1 minutes / 26 MB |
Full Timing Simulation |
186.2 minutes / 775 MB |
176.9 minutes / 742 MB |
Timing simulation of subsection |
7.7 minutes / 35.8 MB |
28.0 minutes / 112 MB |
Full simulation, timing only on subsection |
13.8 minutes / 56 MB |
48.9 minutes / 134 MB |
Table 1: Runtimes and memory usage for different styles of simulation for two FPGA designs
Looking at the Verilog design, which is a somewhat larger and more complex data-path style of design, we targeted the Xilinx Virtex-4 LX80 FPGA. We split it into 14 sub-levels and one top-level using the KEEP_HIERARCHY constraint to enable piece-wise timing simulation. We see larger runtimes compared to the VHDL run but similar improvements for this design. Performing a timing simulation of just the section that changed compared to simulating the entire design saves us 6.3x the runtime and 6.6x the simulation memory. Simulating the entire RTL design replacing just the portion of design that changed with a timing simulation netlist still shows a 3.6x runtime improvement and a 5.5x reduction in memory requirements.
In both designs, the coverage for the changed module is exactly the same and design debug was easier due to the faster runtimes and a smaller design to analyze. The simulator also felt more nimble likely due to the lighter memory requirements. We noted that it is possible a lesser machine (slower, less memory) could be used for this simulation with this methodology, which expands the resources available to verify this design and allows for parallel runs to further reduce the overall runtime.
Conclusion
In summary, this article covers methodologies for advanced verification with a technology that is currently available. This is by no means a revolutionary methodology but one that either most designers are not fully aware of or fully understand. These are techniques that have been used in the past for different types of simulation and verification, but may not have been used to their full potential. Using hierarchical simulation can have an immense effect on how much time and effort it takes to completely verify a design. Hopefully, with the aid of this article, it is possible to accomplish faster and more efficient timing simulation while reducing the simulation hardware requirements for future FPGA designs.
Premduth Vidyanandan is the technical marketing engineer for the Design Software Group at Xilinx. In this role, Vidyanandan is responsible for product definition, requirements and education for the simulation and verification solutions within the Xilinx ISE Design Tools, including the HDL simulation libraries and ISE Simulator. Vidyanandan joined Xilinx in 2001 and has held various positions in customer and product applications prior to his current position in technical marketing. He holds a bachelor’s degree in electrical engineering from Purdue University.
Company: XILINX INC.
Product URL: Click here for more information