With each new chip that comes out, benchmarks become less and less relevant, and are now almost pointless. The problem is about how do you measure a measure a system’s performance and quantify the results when it does more than what you need.
There is one central problem with benchmarks, and it is the stuff of nightmares for Intel, AMD, and the rest, that is “good enough”. Good enough is not just a quick phrase to describe a complex problem, for once, the underlying problem is actually not that complex. If your PC, phone, GPU, or anything else under the sun does the job, why do you need more?
Examples of this abound, if your refrigerator is running nicely, is big enough for your needs, and doen’t make funny noises, why do you need a newer one? You might be able to get one that looks a bit fancier, or it may save you $.17/month in electricity, but are either worth spending $750 for? Not unless you have more money than brains.
Think about your tools, and I mean the good old fashioned hammer, screwdrivers, and wrenches. If you don’t make your living with them, you probably have a bunch of hand tools that you have owned for a long time, possibly given to you by your parents, grandparents, or simply purchased years ago. If Stanley comes out with a new hammer that is 17% lighter, provably 31% grippier handle, and lasts 14x as long, will you rush out and buy one? A carpenter might, but a sane person probably wouldn’t until their hammer broke. Like the refrigerator, hand tools are good enough.
Fortunately for the semiconductor industry, suppliers, upstream and downstream food chains, and the associated ecosystem, there has always been a clear use case for more CPU and GPU power. The rush to mobile computing, be it phones or laptops, has pulled energy use in to the mix too, but for the most part, that is a facet of speed/performance. You can almost always trade speed for energy use, or burn more and get more, within limits.
Having been around PCs since the early early 1980s, a new CPU was always a good thing. When you moved from 8MHz to 12, you really saw a difference. Your text didn’t lag keypresses, spell checks could be done in the background, and graphics became possible. With each increase in clock, you could see a large double digit increase in performance, and there was always a use for it.
Memory and hard disk space were also severe constraints, going from 16K to 48K of RAM could make things fly. Doubling it again would allow for non-text graphics in the real world. A 20MB HD may seem tiny now, but without the software bloat of Windows, it was voluminous, especially in the days when you could fit multiple games on an 80KB floppy.
This history lesson may sound much akin to kids bragging outside the general store about the quality of their horseshoes, and how it made their buggy faster, but there has been a linear path between CPU performance, memory size, and storage space to what the user could do. This curve lasted 30+ years, probably longer if you go back to the pre-PC days, but that is harder to trace directly to the user of today.
Starting about 5 years ago, and gaining speed with startling rapidity, this train went off the rails for PC users, but not for servers. Yet. At some point in this time, CPU speed became good enough. Storage became good enough too, and memory hit a more than sufficient point. It started to become obvious about the time quad-core CPUs came out, you could see reviewers wondering why two more cores meant slower real-world performance. Shortly after that, more storage space started to mean a higher percentage of free HD space that remained empty until you replaced the machine, and more memory did nothing at all. GPUs are just reaching this plateau now, within a generation, they will be pointless too.
For CPUs, the culprit is mainly single threaded performance, it has barely grown at all in recent years. The CPU makers have thrown cores at the marketing people who have in turn spun it as something somehow useful even if they can’t tell you why. Running 16 copies of Linpack on your 16-thread laptop will get you 16x the speed of a single copy on a single core, but so? Will you really be running 16 copies of anything on your laptop at once? Will you ever be running 16 different anythings that take up significant CPU time?
The user is the weak link here, if your CPU can process 8 HD video streams at once, can you watch them all? At once? If the next generation doubles that to 12 streams, so? Wouldn’t you rather have a smaller, cheaper CPU, or better yet, one with more battery life? Sure. And that is what is happening now, the core count has stagnated at 4 because more just isn’t useful for the mainstream. You can get more, but for most users, they will be idle.
Battery life is the same, we are now at the point where you can get hours on a charge, and the energy used by CPUs at low utilization is in the low single-digit watts. Peak and or TDP may be 25, 35, or more watts, but that is rarely seen for human measurable time frames now, for most work, CPUs use an order of magnitude less power. And the next generation will drop that once again. And the following one will too.
This is a good thing, right? Halving CPU power use in the real world will double battery life, right? Wrong. If you look at the power used by a laptop or phone, the overwhelming majority of it is usually screen power. If your CPU uses 3W on average, the rest of the machine 2W, and the screen 10W, halving the CPU power gets you 10% more battery life. Is it worth it to buy a new laptop for 10% more battery life when the next generation chip comes out? Maybe, but doubtful for the overwhelming majority, a bigger battery would cost far less and provide far more real world run time.
How about for performance? It would be faster, right? Sure, 10% faster. Maybe. Would you notice? Your load times would still be gated by your HD, your web browsing would still be limited by your ISP connection, and everything else would be stopping you long before you ever noticed 10% more speed. The CPU is good enough. This scares the hell out of Intel and AMD.
Hard disks are the same. Terrabyte hard drives now cost well under $100 retail. What can you do to fill that? About the only thing a single user can do is movies, and for that, you almost need to pathalogically collect HD content. If a 2 hour 1080p film takes about 15GB, you can store 50+ on your HD while still having room for your OS, games, programs, and anything else you want. At 1MB a minute, you can’t fill up a TB with music and actually listen to it in a single lifetime.
If you do fill it, most modern PCs can RAID 4 2TB HDs with ease, and then video starts to hit the ‘not in a human lifetime’ viewing problem. 2GB HDs cost $119 as we speak. Memory has hit the same wall, it is hard run out of room in 8GB, currently $36 retail, but if you do, it is unusual to find a PC without two free slots to add more. 64GB is not a trick in many systems, but just try to fill it without making up silly use cases. If you don’t use Windows, 8GB is severe overkill for non-server uses, it is hard to justify even this currently mid-spec configuration.
GPUs are a slightly different problem, they are just running in to the performance wall. A modern high end GPU, AMD 7970 and Nvidia GTX680, will push out frames on any modern game at speeds higher than any single monitor can display. These boards are more than capable of running games across three screens, six for AMD, and doing it at acceptable frame rates.
CPUs are good enough. HDs are good enough, and RAM is good enough. GPUs are almost good enough, and the dual cards that will be released in a few months are already having to justify themselves. With each new release, the few remaining users that could justify more performance are subsumed by good enough, their numbers are shrinking rapidly.
You might recall this article was started out by talking about benchmarks. If you have hardware that is good enough, what does a benchmark do? If your GPU can spit out frames of a game at 172FPS, but your monitor can display only 60FPS, you need the next generation because….? If your CPU has average utilizations of less than 25%, doubling the speed of it will do what for you? How much do you want to spend for HD space you will never use? If you are at a buffet, and there is more food than you can physically eat, doubling available food won’t do much, nor will 198FPS, even if it is 26 more than last generation’s 172FPS.
Modern benchmarks measure numbers, and do it quite well. They will show you exactly how much time X takes, or how much throughput your Widget XRT has, but your PC spends most of it’s time waiting for the lump of meat behind the keyboard to input something. Showing how long your PC is waiting for the next key stroke isn’t exactly a telling revalation.
Modern benchmarks do give you information, but they can’t tell you the most obvious thing to any user, is a part good enough? Any user can tell that immediately, but a benchmark can’t. As each new generation pushes the bar farther and farther up, it is hard to find a modern PC that isn’t good enough at everything. Luckily, there is a faster generation coming real soon. Unfortunately, with the exception of GPUs, it doesn’t matter.
I will posit that for 99+% of users, more CPU speed just doesn’t matter, everyting but the bottom basement crap and hamstrung Celerons are more than sufficient, and the price difference between that and the same chip with a few less fuses blown is almost irrlevant. If you run benchmarks on both chips, it will give you numbers, but not what you need.
So what do you do? How do you decide? Some companies are talking about ‘experiential benchmarks’, basically a benchmark that measures the user experiene. This is, well, silly, because you end up with defining things like average use and typical user. Average and typical are not words that lend themselves to real world behaviors of complex biological systems like people very well. I guarantee that a typical 15 year old girl does not ‘typically’ do the same things on a PC as a 45 year old man.
Some are talking about user configurable benchmarks, ones where you set your use cases and it runs numbers based on those. Once again, they might spit out numbers based on what you do, but they can’t even begin to quantify good enough. Even if it is easy to tell within a few minutes of booting a system, good enough is almost impossible to measure with a program.
Benchmarks can measure what you tell them to, customize themselves to your workload, your neighbors, or any average, typical, or mediocre individual, but they can’t tell you what you need to know. Benchmarking the next generation of hardware will clearly show 17% faster CPU, 49% faster GPUs, double the HD space, and 4x the DRAM. So? Reviews will blather on about how much more X Y has, but never mention if it is good enough. That is what you need to know, and it can’t be measured in the now useless traditional ways. What do we do now?S|A
Update: Fixed GB/TB spelling error.
Latest posts by Charlie Demerjian (see all)
- ARM server vendor lays off 130 and cancels cores - Oct 30, 2020
- Interesting details emerge about Intel’s Rocket Lake CPU - Oct 30, 2020
- Codeplay joins the RISC-V foundation - Oct 29, 2020
- Intel’s palpable desperation on display with Rocket Lake - Oct 29, 2020
- How did AMD take the lead in GPU performance - Oct 28, 2020