ITC keynoter comments on 75-GFLOPS/W compute target
Performance and reliability are increasingly interdependent, according to Pradip Bose, manager, Department of Power- and Reliability-Aware Microarchitectures, IBM Thomas J. Watson Research Center. In a Thursday morning International Test Conference keynote address titled “Efficient Resilience in Future Systems: Design and Modeling Challenges,” he described an IBM-led DARPA PERFECT (Power Efficiency Revolution for Embedded Computing Technologies) initiative, which aims to achieve reliable compute power of 75 GFLOPS/W. IBM is working with Stanford, Harvard, and UVA on the project.
Bose defined fault tolerance as the ability to provide service despite hard or soft faults generated inadvertently or maliciously. Classical fault tolerance, he said, refers to tolerance to faults that conform to particular fault models, but there is a need to move beyond this definition to contend with faults that weren't anticipated in the development of initial specifications. He described energy-secure systems that are resilient to corner-case scenarios or attacks.
He noted that in traditional designs, power can increase without a corresponding increase in instructions per cycle (IPC), leading to the power/performance wall. He did add that special-purpose workload-optimized, throughput-oriented high-performance-computing (HPC) chips quadruple in performance every two years, while general-purpose processors only double in performance over the same period. He cited a target of 50 GFLOPS/W for exascale systems by 2020—not that far from the target of the PERFECT program.
He then described a reliability wall. Resilient systems, he said, on encountering an error can back up to a golden state. Unfortunately, as the number of processors increases, the number of rollbacks can also increase, resulting in an application being unable to run to completion—MTBF can shrink to minutes.
Approaches to solving such problems, ha said, include implementing cores augmented with parity checks, but a research challenge lies in determining where to insert error-checking functions—and how many.
He next commented on the power wall. Engineers are pursuing extreme measures to reduce power consumption, he said, but such measures (lower voltages, for example) can increase soft error rates (but on the plus side may help prevent hard errors).
The PERFECT program's goal of 75 GFLOPS/W will probably be achievable at the 7-nm process node, he said. To overcome the challenges of power and reliability, he concluded, cross-layer modeling and optimization is the key.
A guiding principle in IBM-led effort, he said, is informed by a quote from Albert Einstein: “Everything should be made as simple as possible, but not simpler.”
See these related ITC articles:
- ITC keynoter addresses nonlinear validation challenge
- NVIDIA adopts OT tools to help the imaginary appear real
- Test is not a passing fad
- Mentor and ASSET Deliver IJTAG Chip-to-System-Level IP Integration
- Synopsys Announces DFTMAX Ultra to Reduce Silicon Test Costs
- Open-Silicon Improves Test Quality with Tessent Cell-Aware Test
- Synopsys Debuts DesignWare STAR Hierarchical System to Accelerate SoC Test
- Renesas Reduces Cost, Improves Quality with Hybrid TestKompress/LogicBIST Solution
- GOEPEL Debuts Automatic Application Program Generator
- JTAG Technologies Highlights JTAG Live Products at ITC
- GOEPEL and iSYSTEM Develop Integrated Validation and Test Platform