Thermal Analysis and Other Simulation Types

Thermal analysis is critical in determining potential failure mechanisms in ICs, packages, and boards. What is often less well understood is how thermal analysis ties in with electrical and electromagnetic analyses. This article digs into those dependencies.

Byron Blackmore

July 27, 2011

12 min read

Add Us On Google

1 of Enlarge image

Fig 1. This image shows the temperature distribution on the top surface of a USB memory stick PCB.

Fig 2. Shown is an isolated view of thermal bottleneck distribution in the top layer of the memory stick PCB shown in Fig. 1. White and yellow areas indicate areas in which heat flows are most constrained.

Fig 3. Shown is a typical neck-down area on a power distribution net. The net in this view is color-graded by current density.

Fig 4. Shown is a typical heat sink geometry for a power amplifier module. The large heat sink can mechanically stress the board and its connections.

The primary function of thermal analysis is to predict the temperatures of components and parts within a product. By visualizing heat fluxes, thermal bottlenecks, and missed shortcut opportunities, thermal analysis seeks to eliminate any detected thermal compliance issues.

These temperature predictions are important to other analysis disciplines as well, as many real world engineering materials are known to have temperature-dependent thermo-physical properties. Temperature effects can be critically important, especially in power distribution, signal integrity, and timing signals. Copper’s impedance increases with increased temperature even within common design temperature ranges. Moreover, there may be tradeoffs when deciding what is good for thermal performance and what is good for the rest of the design. This article will discuss how thermal analysis results can influence other forms of analysis and the design tradeoffs that may result.

Thermal Analysis Moves Ahead

For the past 20 years, computational fluid dynamics (CFD) techniques have provided 3D conjugate thermal simulation results that predict and display temperatures in and around electronic product designs. Thermal designers routinely use predicted temperatures to judge thermal compliance, simply by comparing the simulated temperatures to maximum rated operating temperatures. If the operating temperature exceeds the maximum rated value there will be at least a potential degradation in the performance of the packaged IC, and at worst an unacceptable risk of thermo-mechanical failure. These techniques are commonplace today, with widespread adoption all across the electronics sector including heavy usage in semiconductors, telecommunications, automotive, aerospace, and consumer products.

The typical means of visualizing (Figure 1) the predicted temperature field for a printed-circuit board (PCB) provides useful information. However, the latest advances in thermal simulation also offer the calculation and display of thermal bottlenecks and shortcut opportunities (Figure 2). These offer insight into the reasons why certain temperature distributions occur and how best to resolve thermal issues.

Electronic Materials and Temperature

Frequently the variance in thermo-physical properties for a substance is large enough across the expected temperature range to be a first-order design effect. A common example is the thermal conductivity of silicon, which decreases by approximately 20% as temperature increases from 350°K (~77 °C) to 400°K (~127 °C). This of course has the tendency to exacerbate thermal problems at the die level. The hotter the die becomes, the more difficulty heat has in exiting the die due to the lower thermal conductivity value. This effect is often described as a ‘thermal runaway’ scenario.

Copper is used extensively in the electronics industry of course, and it too can have significant thermo-physical property changes over the expected range of operating temperatures. For example, the electrical resistivity of copper increases approximately 4% for every 10°C temperature rise within typical temperature ranges. That equates roughly to a 32% variation in resistivity over an 80°C span of temperatures. This has a substantial effect on the DC resistance of the copper in the board and significantly impacts the voltage drop and current density within the board. As joule heating effects are directly caused by current density and resistivity, temperature impacts the power distribution, and the power distribution impacts temperature.

This strong interaction is one of the greatest design challenges in modern PCB design, as it adds complexity to any attempt to provide enough metal on the board for DC current needs. ‘Neck-downs’ in the power distribution network will cause locally increased current density and will induce large joule heating terms, elevated temperatures, and associated changes in electrical resistivity.

Consider a typical neck-down (Figure 3) on a power distribution plane. A neck-down may be a narrow section of a plane, a via that is connecting the power supply to the plane or two planes together, or a narrow trace that is expected to carry tens of amperes. Such neck-downs can, in severe cases, act like fuses which can lead to disconnected power situations and even mechanical failures. At the very least, these neck-downs cause a rise in temperature on the board. The amount of temperature rise depends on how surrounding metal is connected. Prediction of the rise requires a detailed thermal simulation of the board.

Both power and temperature affect transistor operation. In fact, transistor performance is usually partitioned into PVT corner cases that map variations in process, voltage, and temperature. Voltage and temperature are greatly affected by the PCB design. Standard I/O buffer models known as IBIS models are used in system simulations to characterize buffers by using I-V (current-voltage) and V-t (voltage-time) tables for each of the different PVT corners. For example, a CMOS buffer has a maximum corner with I-V and V-t tables for fast process, high voltage, and low temperature, respectively. It is important to consider all these factors to properly account for the many I/O buffer performance variations that can arise.

At IC process technologies of 90 nm and below, leakage currents begin to cause appreciable additional heat sources. These leakage currents have a non-linear, increasing relationship with temperature. At these process scales, the temperature is required to evaluate the power dissipation, and vice versa, making the inclusion of this relationship in the thermal management scheme a necessity.

Temperature also has a powerful effect on mechanical stress and strain. Most materials experience a significant decrease in Young’s modulus (e.g., a drop of approximately 20% from 50°C to 100°C for Sn-3.5Ag solder) and an increase in yield stress with an increase in temperature, as well as an associated rise in the coefficient of thermal expansion for that material. Including temperature effects during stress analysis simulation is critical to properly predict thermo-mechanical failure and reliability metrics.

It’s essential to be aware of these temperature effects throughout the design process. The temperature dependence of physical properties in common electronic materials means thermal design must be coordinated with the power distribution, signal integrity, and mechanical failure analyses. In addition there are many thermal issues that stem from the temperature dependency of material properties in other analysis and design disciplines.

Electrical and Thermal Tradeoffs

Component placement commonly involves tradeoffs between thermal and electrical disciplines. Often the ideal placement from an electrical perspective is the least desirable from a thermal management perspective.

A prime example is the placement of components as closely together as possible for purely electrical reasons. Shorter connections from pin to pin are generally good from a signal integrity standpoint. It is common practice to constrain routing with maximum allowable distances for particular connection—a direct result of prioritizing electrical considerations.

But this placement strategy may be in direct conflict with the thermal management ideal. Placing components close together results in increased power density locally and invariably leads to elevated temperatures among all the components in a group. When components are grouped tightly to improve electrical performance, the ‘thermal victim’ effect may appear in components that would not otherwise pose a thermal challenge.

A second example is the thermal rule of thumb which claims that components with the largest thermal management issues (high powers and power density) should be placed as the near the leading edge of the board as possible, to receive to the coolest possible air in a forced convection cooling system. But this may be impractical from an electrical perspective when components from diverse functional partitions are grouped together for reasons of routing, timing, and signal considerations.

In the field of IC packaging, there is the design problem of hot spots (zones of elevated power density) aligned vertically in stacked-die devices. This can have a drastic effect on the peak silicon temperature as well as the temperature gradients present across the dice. Moving the hot spots so they do not stack vertically is a sound approach, but it can add electrical and manufacturing difficulties in packaging. Moreover, it requires careful planning in the functional partitioning of the active surfaces.

Electromagnetic Compliance and Thermal Tradeoffs

There are further design tradeoffs to be considered in the field of electromagnetic (EM) compliance and its relationship to thermal issues. Here too, design proposals aimed at improved EM containment can detract from thermal performance.

The concern applies to many aspects of EM design. Consider a vent or perforated plate design. From the perspective of EM compliance, each cooling vent in the chassis should have as small a free-area ratio as feasible; in fact the best-case scenario is often to eliminate vents altogether. But any reduction in the free-area ratio of a cooling vent will likely decrease the thermal performance of the design. A less open vent will impose additional flow resistance through the system, reducing the amount of air that moves through the chassis, whether the air movement is mechanical (fans or blowers) or buoyancy driven. A design compromise must be found that is acceptable to both design disciplines.

A second example is the use of shielding cans. Shielding cans are metallic enclosures that envelope ‘noisy’ components that pose particular difficulty for the EM design. While these cans can effectively attenuate the emissions from the components, they pose additional challenges to the thermal design. By placing a solid obstruction over a component, we are effectively removing a heat transfer avenue by reducing the convective heat transfer ability on the top of the component. This forces most of the heat to reach the ambient via the PCB, and changes to the local copper content and distribution in the form of fills and thermal vias may be needed to achieve satisfactory thermal performance.

A third example is found in heat sink design. Considering the thermal design only, a heat sink with more fins and more surface area is generally better at allowing heat to escape from a component. This isn’t always true of course, as heat sink geometry imposes an obstruction to air flow that must be considered, but for the purposes of this argument we’ll allow it. However, the EM compliance design may very well suffer as larger and larger heat sinks are proposed, as the heat sink may begin to serve as an ‘emissions antenna’ and exacerbate EMC problems. The best heat sink for the thermal design may not be ideal for the EMC design.

Stress and Thermal Tradeoffs

The design of heat sinks can have further tradeoffs when considering mechanical stress as well. One common example of this is the size (and therefore the mass) of a heat sink design. Usually a bigger and heavier heat sink will yield better thermal performance than a smaller, lighter one (again, ignoring the complexity posed by reduced flow rates and questions of cost effectiveness). However, the increased mass will cause additional mechanical stress concerns. If a heavier heat sink is attached to a component it may require additional mounting attachments. In the case of a vertically mounted heat sink, the heat sink may act as a ‘cantilever’ with increased stress effects being observed at the heat sink attachment points. An example of such a heat sink design (Figure 4) is shown.

Further, the selection of materials can have important effects. Copper has excellent heat conduction characteristics (k ~ 400 W/mK) and is often used when thermal management is the foremost concern. However, use of copper may have drawbacks on the stress design. Copper’s coefficient of thermal expansion is approximately six times larger than that of silicon, which imposes additional challenges in some cooling strategies (through-silicon vias are one example), as the materials will tend to strain at greatly differing rates.

Summary

Temperature predictions within electronic systems are still used primarily to compare the thermal performance of the design and judge thermal compliance. But it is vital to acknowledge the secondary effects of temperature on other design disciplines. Very often a design change targeted at improving thermal performance will have negative impacts on another aspect of the design because the properties of copper, silicon and other common materials have important—and differing—dependencies on temperature. These can complicate the design of power distribution, signal integrity, electrical timing, and mechanical stress solutions and should be built into the design as early as possible by utilizing thermal simulation techniques in parallel with other design flows.