In July, a University of Michigan computer engineering professor put out a new idea for measuring the efficiency of a processor design. Todd Austin’s LEAN metric received both praise and skepticism, but even the critics understood the rationale: A lot of silicon is devoted to things that are not actually doing computing. For example, more than 95 percent of an Nvidia Blackwell GPU is designated for other tasks, Austin told IEEE Spectrum. It’s not like these parts aren’t doing important things, such as choosing the next instruction to execute, but Austin believes processor architectures can and should move toward designs that maximize computing and minimize everything else.
Todd Austin
Todd Austin is a professor of electrical engineering and computer science at the University of Michigan in Ann Arbor.
What does the LEAN score measure?
Todd Austin: LEAN stands for Logic Executing Actual Numbers. A score of 100 percent—an admittedly unreachable goal—would mean that every transistor is computing a number that contributes to the final results of a program. Less than 100 percent means that the design devotes silicon and power to inefficient computing and to logic that doesn’t do computing.
What’s this other logic doing?
Austin: If you look at how high-end architectures have been evolving, you can divide the design into two parts: the part that actually does the computation of the program and the part that decides what computation to do. The most successful designs are squeezing that “deciding what to do” part down as much as possible.
Where is computing efficiency lost in today’s designs?
Austin: The two losses that we experience in computation are precision loss and speculation loss. Precision loss means you’re using too many bits to do your computation. You see this trend in the GPU world. They’ve gone from 32-bit floating-point precision to 16-bit to 8-bit to even smaller. These are all trying to minimize precision loss in the computation.
Speculation loss comes when instructions are hard to predict. [Speculative execution is when the computer guesses what instruction will come next and starts working even before the instruction arrives.] Routinely, in a high-end CPU, you’ll see two [speculative] instruction results thrown away for every one that is usable.
You’ve applied the metric to an Intel CPU, an Nvidia GPU, and Groq’s AI inference chip. Find anything surprising?
Austin: Yeah! The gap between the CPU and the GPU was a lot less than I thought it would be. The GPU was more than three times better than the CPU. But that was only 4.64 percent [devoted to efficient computing] versus 1.35 percent. For the Groq chip, it was 15.24 percent. There’s so much of these chips that’s not directly doing compute.
What’s wrong with computing today that you felt like you needed to come up with this metric?
Austin: I think we’re actually in a very good state. But it’s very apparent when you look at AI scaling trends that we need more compute, bigger access to memory, more memory bandwidth. And this comes around at the end of Moore’s Law. As a computer architect, if you want to create a better computer, you need to take the same 20 billion transistors and rearrange them in a way that is more valuable than the previous arrangement. I think that means we’re going to need leaner and leaner designs.
This article appears in the September 2025 print issue as “Todd Austin.”
From Your Site Articles
Related Articles Around the Web