Nvidia’s Kepler and Maxwell barely beat Moore’s law

GTC looking more like last year, snowy

Nvidia world iconNVIDIA’S JEN-HSUN WANG gave a masterclass in how to snow the trusting last Tuesday during the keynote of their GTC conference. He bent statistics well past the breaking point, and the tame press lapped it up, parroted it back, and didn’t check a single stat. Mission accomplished.

Luckily, we picked up the baton that others dropped and actually did the math on the upcoming generations. Once you look at the numbers, the GPU families Jen-Hsun touted, Kepler and Maxwell, don’t look so hot. OK, bad choice of words given their current product line, but don’t let that turn you off. Lets look at what was promised vs reality.

If you watch the Keynote here, the good stuff isn’t until the very end, 1:40-1:47 or so. It isn’t a bad keynote, so if you care about GPU compute, it is worth watching, but the relevant parts to this article are the last 7-8 minutes.

During the roadmap reveal, there were some very interesting things said, and not said. The most interesting is that Nvidia revealed their long term roadmap, something they have never done before. The reason for this is distraction, the last thing they want is a conference talking about how bad the current architecture is. Nothing better to distract the press than shiny nebulous future promises that may, or like last year, may not come true.

Nvidia Roadmap

The roadmap in question

Jen-Hsun said that Fermi is rated at 768GF peak in double precision, which would equate to a 512 shader chip capable of .5 DP FP calculation per clock running at a hot clock of 1.5GHz. Fair enough, that is exactly what we said they were aiming for last year. At least the company is starting to come clean on something, lets hope it is a trend.

The Nvidia data sheets were here, but now seem to have been replaced with a far more optimistic version. The July version is markedly different in some critical areas like power consumption. The April version had the C2070 at 247W, the new one has it at <=238W. Elsewhere on the site it is also listed at 225W, but what the heck, lets just just use 247W.

C1060 performance

C1060 performance numbers

The previous generation C1060 is listed in the data sheets as 187.8W, but lets round to 188W. According to some Nvidia presentations, it is listed as 78GF DP FP. So, doing some simple math gets you the following table. The ‘Fermi’ column is based on the numbers we heard were the goals pre-tape out.

GPU performance table

The raw numbers

Getting back to the GTC conference, Jen-Hsun then started talking performance per Watt rather than raw performance, and threw in a few more tidbits. This is where the snow started flying. Kepler is based on a 28nm process, a ‘full’ shrink. For the sake of this article, we will assume that a full shrink gives 2x the transistors per mm^2, and halves the power per transistor. This gives them twice the transistors for the same power budget, basically Moore’s law with some added power corollaries.

If you look at the roadmap chart, the Tesla and Fermi numbers line up with the calculated table above, so that is a good start. From there, Keppler comes in at an estimate 5.75DP GF/Watt, slightly higher than what was promised pre-silicon by Nvidia. If you are expecting a new architecture from Kepler in the same vein as the G80 -> G200 or G200 -> GF100 changes, I wouldn’t hold my breath.

On a more pessimistic note, you could consider that the 28nm process from most foundries won’t deliver anywhere near the expected 50% power savings. Given that, Kepler looks more and more like a bugfixed Fermi than anything else.

Jen-Hsun then went on to say that Kepler would have 3-4 times the performance per Watt of Fermi, which he claimed was 1.5GF/W. That would put Kepler at 4.5-8GF/W worst to best case. A simple shrink should get Nvidia from the current 2.1GF/W to just over 4. A fully working Fermi chip with 512 shader active instead of the 448 that the best compute board available has would be perilously close to 2.25GF/W, and a shrink would sit right at 4.5GF/W.

On the upper bound, the revised Fermi, GF104, has 336 active cores at 1350MHz hot clock while consuming 160W. If the DP capabilities were not fused off, that would put it at about 2.75GF DP/W. A shrink to that, slightly higher clocks, and a slight power savings from the shrink would once again put things perilously close to the upper bound.

No matter how you look at it, in the performance/Watt game, Kepler looks like a minor warming over. If you were cynical, you might say that they just changed the name to avoid the Fermi taint. If you were really cynical, you could point out the change looks about as great as the 8800 to 9800 jump, but nowhere near the 9800 to 250 leap. Kepler looks like a yawner from the performance per watt standpoint unless Nvidia really bumps up the feature set. Why did Jen-Hsun put these numbers up, they are an ‘own goal’.

That brings us to Maxwell, the 2013 architecture. The process it is on was not named, but given the 2 year fab shrink cadence, it is almost assuredly on 20nm silicon. That would mean another shrink, and 2x the transistors plus half the power per transistor. So far so good.

Here is where things got interesting. Instead of talking about performance per Watt compared to Fermi like he did with Kepler, Jen-Hsun changed gears and compared Maxwell to Tesla, the 2007 chip. Instead of being 16x the perf/Watt of today’s chips, likely impossible, it was 16x the perf/Watt of an obsolete line.

Tesla as put out in the C1060 compute card has a .41GF DP/Watt bottom line. 16x that would be, wait for it, less than 8GF DP/W. Where have we seen that number before? Once again, using Nvidia’s own numbers, we see that performance per Watt isn’t going up markedly for at least the next four years. If you look at things from the perspective of Maxwell, Kepler is not nearly as efficient as we stated above. When you have nothing, spin, and spinning is all the keynote really offered.

In the end, far from the huge leaps in performance you would expect from the chart shown, the next two Nvidia architectures are unlikely to be anything more than what Moore’s law would bring, with a few added features tacked on. Lets hope those features are pretty killer, because the promised specs most definitely are not.

The roadmap release was a move of desperation on Nvidia’s part, they have to distract from things like their current gross margins. To do so, they did something they had never done before, and then put up the numbers in a way that would look very good to those who don’t bother looking below the shiny bits. It was a classic snow job, and the press fell for it, hook, line and sinker. That is the saddest part. For Nvidia’s sake, lets hope the silicon is a lot better than they promised.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate