Many of the metrics that companies currently use to determine and market their
products' performance have become outmoded and need a new approach. Measurements
commonly assigned to system performance—CPU clock rate, amount of main
memory, storage capacity, theoretical aggregate bandwidth between the CPU and
its primary memory subsystem—no longer are as relevant to most applications
as they once were. Anyone who believes that these easy-to-quantify metrics will
accurately predict system performance likely hasn’t performed many such
measurements on real systems.
Focusing just on performance (not ease of use, maintainability, or other broader
aspects), customers really only care about how quickly their applications run.
And the vast majority of applications sees no system performance increase once
the basic subsystems have reached a certain minimum threshold, which in most
cases we’ve already achieved.
Take CPU clock rates. A CPU running at some mind-boggling megahertz rating
provides less value if it needs such a large fan you can’t hear yourself
think, especially if the CPU is spending most of its time in the idle loop.
Focusing too much on any one component is like playing the fastest running back
on the football team even though he fumbles the ball most of the time. The outcome
will disappoint you. If you look at how real systems operate today, the performance
is determined not by the individual ratings of the underlying building blocks,
but rather by how well the subsystems work together— the interconnect.
Looking at it another way, today’s systems operate much like the networks
of yesterday. You’ll speed up network performance if you employ a higher-speed
local-area network connection. Today’s systems are similar, only with
other, less obvious interconnect issues to resolve.
I’m not suggesting we just replace tried-and-true measurements such
as a CPU’s clock speed with some theoretical aggregate interconnect bandwidth
measurement. That would merely swap one misleading metric with another. To properly
determine if the measurement has some value, we need to look more deeply. Sometimes,
very low initial latency is just as important to a system as its possible throughput.
If an amazingly fast CPU is waiting for a reply to a query before handling its
next operation, it doesn’t matter how much extra interconnect bandwidth
is available, since it’s not being used. An interconnect’s overhead
is just as important, since the usable bandwidth largely determines system performance.
A protocol like Ethernet, for instance, has high latency and uses up much of
its bandwidth for overhead. So, it’s a poor choice for subsystem interconnection.
Looking deeper into the subsystem, we can see the importance of interconnect
“blocking.” Blocking occurs when one piece of data gets stuck and
prevents another from getting through, even if the second piece has a clear
channel in front of it. The likelihood of interconnect blocking depends a great
deal on how the protocol is specified and on how the hardware is designed. It’s
not difficult to measure things such as blocking, as long as they’re identified
in the system design process.
Many other metrics are better suited to predicting system performance, and
each one of them is readily quantified once you determine what you need to look
for. Luckily, there’s an easier way to ensure better overall system performance—a
state-of-the-art interconnect such as PCI Express. It provides the high-bandwidth,
low-latency, low-overhead, and blocking-resistant features ideal for modern
subsystems. Furthermore, PCI Express silicon is readily available now in a wide
range of application-friendly configurations.
This new approach to performance measurement doesn’t mean you no longer
need to think about system-level performance. But it does give you a significant
head start in your design activity.
See the figure