Much has been written in the past few months about the upcoming sunset of Moore’s Law. Stated succinctly, Gordon Moore predicted in 1965 that the number of transistors in ICs would double every 12 months1. There have been other versions of his prediction, including shifts to a performance metric and altered timeframes. However, the message is clear: a key metric doubles on a regular schedule and has for many years.
Moore has also stated that no exponential growth can continue forever, and it appears we are nearing the end of conventional silicon semiconductor scaling as we know it. Fundamental limits are being approached now both in planar silicon transistor technology and in on-chip interconnects. Furthermore, the costs for leading-edge photolithography have greatly accelerated as process nodes have shrunk.
Making faster transistors that are smaller and don’t leak when they’re turned off is getting extraordinarily difficult and expensive. They need to be good switches with low on-resistance and high off-resistance, and they need to change states very quickly. Despite the largely planar architecture of the process they’re made on, 45-nm generation transistors are highly complex.
Such transistors combine strain engineering and raised source/drain structures with exotic gate stacks employing high-K gate dielectrics and metal replacement gates as the control electrode (Fig. 1)2. The 32-nm process node will continue to feature planar transistors, but what happens at the 22-nm node and beyond isn’t so clear.
INTERCONNECT CHALLENGES
Interconnecting the transistors is also becoming more challenging. Interconnect delay and interconnect density are the primary issues to address3. As transistors scale, so must the on-chip electrical interconnects.
Depending on the distance separating two interconnected gates, the dominant factor determining the overall propagation delay changes. For localized wiring, the resistance of the transistor dominates. But for longer connections, the resistance of the wires becomes the dominant factor once a certain critical length of wiring is reached.
The critical length is that length where the RC delay of an interconnect line equals the delay of the same length line with a buffer inserted mid-line driving a fanout of four (Fig. 2). As the process node shrinks, this critical length shrinks as well. For a 65-nm process node, the critical length is 100 µm. For 45 nm, it’s 70 µm. And for the 32-nm node, it’s 50 µm (Fig. 3).
As these longer wires are scaled, the resistance is increasing faster than their capacitance is decreasing due to fringing capacitance from the sidewalls of the conductor. But another factor is looming on the scaling horizon: electron scattering in the interconnect metallization.
As the physical dimensions of a conductor approach the mean free path of its charge carrier (electrons), scattering at edges and grain boundaries greatly increases. The scattering of the electrons impedes the flow of current, increasing the bulk resistivity of the material.
For 30-nm dimensions in copper, this can more than double the bulk resistivity of the copper interconnect (Fig. 4). Figure 5 shows the critical dimensions of metallization structures for process node progressions from 65- to 32-nm nodes from NEC. The dimensions are approaching mean free path dimensions, so this phenomenon is now becoming an issue.
This is a serious technological challenge with no known solution4. On the one hand, we need scaled interconnect dimensions to pack more circuitry per unit area. But on the other hand, if we make wire dimensions too small, the electrical properties of the conductor are seriously degraded to the point where the result can be a slower chip after shrinking. In these cases, it makes more sense to use vertical connections to a 3D stacked chip rather than routing high-speed signals across a big die.
POWER AND BANDWIDTH
Power is another serious concern that has been driving scaling. Historically, a given processor scaled to a more advanced process node will dissipate less power at a given performance level. Holding power constant, performance can therefore be increased. The advanced process node is simply more energy-efficient.
Instead of pure clock rate increases alone, it is more power-efficient to limit the clock rates to moderate levels and to scale performance by increasing the number of processor cores on the processor die. As a result, the need for memory bandwidth is increased. Each core needs its own data, and now there are more cores to feed.
On-chip caches reduce the average external bandwidth requirement when viewed over a broad mix of workloads. But they come at a price of die area and power. Because the caches tend to be tightly coupled with a processing unit, including them on-chip tends to spread the cores apart, aggravating the interconnect length/delay issue.
As the industry transitions from multi-core to many-core architectures, the bandwidth explosion issue must be effectively addressed for the potential of the technologies to be realized. For example, Intel has reported that an 80-core processor needs approximately 1 Tbit/s of external cache memory bandwidth.
Adequate-sized third-level caches aren’t practical for on-chip integration, so there is no choice but to implement them on separate die. But delivering that sort of bandwidth to a single CPU chip is challenging from a signal interconnection and power perspective.
Continue to page 2