Nvidia’s Kepler comes in to focus

Late and compute oriented at the cost of graphics

Nvidia world iconWhen SemiAccurate announced that AMD (NYSE:AMD) was aiming for September with Southern Islands (SI), you could almost set your watch to the Nvidia (NASDAQ:NVDA) response. If you are new to the PR game, you will probably scratch your head wondering what we mean by Nvidia response, officially there is silence, but there definitely was a response.

If a competitor makes an announcement, or even if there is credible news about a competing company, you can bet that PR and IR’s phones start to ring in short order. The normal response is to placate callers by telling them your plans, and how they are better, saner, or more profitable than what the other guys just put out. If your plans are not as good, well, it is time to dance.

Normally, dancing and selectively quoting things is enough for most firms, so that is what they do. Some even go so far as to make a counter-announcement, knowing what the enemy is doing is a wonderful way to point out their shortcomings and play up yours. Unless your stuff stinks, and stinks badly. If it stinks so badly that you can’t even spin it well, then you have to go to alternate routes.

IR is very restricted in what they can and can not say, PR less so depending on the nature of the news and the caller. If either department has no choice but to say things that would not pass a 3rd grade teacher’s sniff test, then they can’t attach their names to it. The SEC frowns on that kind of thing, so the operative word becomes plausible deniability.

People watching the sector closely will be very familiar with the next steps. It usually goes something like this. A journalist that a PR or IR person trusts gets a phone call, usually not an email because that could end up as evidence if things go really wrong. The phone call details an exclusive story for the journalists, and these are usually, but not always, truthful. Most of the time, they stretch credulity, and hit some very carefully chosen high points while ignoring a mountain of evidence that would make the caller’s product look pretty anemic.

If the journalist runs with it, they get a big exclusive. If they question it, or worse yet point out the realities that are being skirted by the caller, that tends to be the last call of the type they ever get. If the information passed along is close enough to the truth, and is printed, that works out well for both sides. If the ‘news’ is basically pure male cow excrement, and it gets printed, the company gets what they wanted, but the journalist gets their reputation trashed.

The last scenario is actually the most common. A PR person will call around, or send emails around, that make the recipient laugh. Loudly. Those tend not to go anywhere. A short while later, that email is sent to the next sucker, and so on and so on until the result is a story that says what the PR departments wants out. This ‘news’ will then get repeated by ‘more reputable’ sites shortly thereafter. The next person who calls in to PR or IR asking those pesky questions gets told, “No comment, but did you see that piece that XYZ wrote yesterday? I can’t say anything about it, but it sure was interesting.”. Thorny issue ‘solved’, and with near total plausible deniability.

With that background out of the way, we come back to graphics. AMD taped out Sourthern Islands in February, and likely had silicon in hand before the end of Q1. At Computex, AMD was playing an interesting game, their people were dropping hints that SI was Q1, basically no chance for product in 2011. This seemed plausible because of TSMC’s 40nm and 32nm debacles. In an unusual twist, it looks like TSMC has upped their game significantly for 28nm HKMG, and Q3 looks possible for production. Well done there TSMC.

Nvidia, expecting AMD to release in Q1, seems to also have been targeting Q1 as well. Not early Q1 either from what SemiAccurate’s moles tell us, think ‘barely’ instead of early. When credible word that Q3 was on the cards for AMD made the rounds, panic likely ensued at Nvidia and meetings were undoubtedly held. Given the status of the silicon, it seems very unlikely that the real story would have gone over well, so Plan B was put in to place. The very next day, there appears to have been a timely ‘leak’.

That ‘leak’ appeared on Fudzilla, and can be found here. We don’t have any doubt that this story is real, nor is this meant to cast any negative light on Fudo, this author holds him in the highest regard. It is however a textbook example of how to spin awful news in a way that can be further spun with both plausible deniabilty and no pesky contravening facts. Lets take a close look at it.

The first thing is that Kepler taped out, something that we can’t, as of publication time, independently confirm, but do not doubt. If you recall, when we broke the news that Fermi taped out two years ago, it was the end of July, not the end of June. The resultant GTX480 GPU had a paper launch with just days remaining in March of the following year. That meant a 9 month tapeout to market time, but to be fair, Fermi had an unusually problematic gestation. A solid argument could be made for the GF100B in December of 2010 being the ‘final’ product, but we will stick with 9 months for now, even though GF100 was not a working product on release.

Two key dates to think about here are when AMD taped out, and when Nvidia taped out, February 2011 and June 2011 respectively. From tapeout, the timeline is getting first silicon back in about 6 weeks, and if that is perfect, add three months to that for production. This means a minimum of about 5 months.

Every stepping you have to make adds about 6 weeks to the tally, more if there are tough bugs to quash. The norm is two spins, so about 7-8 months, and AMD is right there now, February to September. If Nvidia is as successful, mid-Q1 is the earliest you should be looking at for Kepler arrival.

Back to the article on Fudzilla, the second paragraph contains three bullet points, that the chip has taped out, there is a lot of leakage yet to be dealt with, and the 40nm to 28nm transition is tough. In isolation, these statements would seem quite plausible, and could easily be spun as OK, which is why they came out in the format that they did. If you compare that to other industry data, things don’t look so rosy.

Kepler taping out is good. Kepler taping out more than a quarter after it’s rival is very bad. The leakage side of things is worse. 40nm chips at TSMC were the last SION products the foundry made, and were very leaky. 28nm moves to a HKMG process that stops a lot of this leakage, but not all. Intel had HKMG two years ago with their 32nm process, and it works very well for them, AMD just released their first HKMG chip last month.

The troubling part is that only Nvidia seems to be complaining about leakage on 28nm products. Checks by SemiAccurate confirm that leakage is indeed way down from what it was, not perfect, but much better than before. To have one company complain about leakage on 28 seems to be more of an indication of engineering problems than process problems. Then again, Nvidia still blames TSMC for Fermi’s failures. This sure seems like seeding the press for the replay, it is much easier than fixing engineering processes. Now do you see why certain things couldn’t be claimed directly?

The last part is the 40nm to 28nm shrink. Everyone is in agreement that it is tough, very tough, to make this transition. AMD has done so successfully, as have several others. Only one company is complaining. At this point in the narrative, there will be no prizes for guessing which company that is.

There are three more points in the third paragraph, that Kepler is a new architecture, it has the same TDP as Fermi, and the transistor count goes up. All three are kind of no-brainers, that is the way silicon shrinks work. Kepler is similar to Fermi, but by piecing together several data points we learned over the past two years, we can shed a little more light on things.

First is that Kepler has about 2.5x the DP floating point performance of Fermi, at least according to Nvidia projections. This is misleading on a number of fronts, the most notable is that Fermi only does DP FP at half the rate of SP, so doubling that comes down to spending the transistors. The other option is that the shader count went up quite a bit, 2.5x as many as Fermi, 1536 instead of 512. Given that Fermi was larger than is reasonable to make on a cutting edge process at TSMC, if you actually want your chips to yield, this isn’t a good idea.

The last point is that TDP remains constant versus Fermi, something we very much doubt will be the case because of architectural missteps on Nvidia’s part with Kepler. Fermi seemed to draw notably more power than was listed on the spec sheets, and several AIBs have told us about phone calls not so politely requesting that the TDPs on boxes be lowered from reality to official Nvidia spec. The question that should now be asked is whether the TDP will be at the same real number, or lowered substantially to where it currently stands on paper.

This is the key point with Kepler because the TSMC 28nm HKMG process has an outer bound of 45% speed improvement or a 40% power savings according to the TSMC web site. As is always the case, it is an or, not an and, for the savings. Assuming Kepler keeps the same real power draw as Fermi, and is bound by reticle size and production limits to about 2x the transistor count of Fermi, that bounds the box even more tightly.

Update: Fixed 40nm/28nm error above.

The first problem is that you can’t power 2x the transistors as you had at the 40nm node in the same TDP without drastically lowering the clock. Your max is going to be about 1.40x Fermi, any more and you exceed the power limits of the chip, and of PCIe cards. Dropping the clocks enough to make 2x transistors possible means that you take a massive yield hit for no real gain over a correctly architected chip with higher clocks and less transistors. It makes no sense. This means you are looking at 1.5x the transistors, give or take a bit.

That puts shader counts at around 750, meaning Kepler would have DP performance at 1.5x Fermi assuming no other changes to the architecture. This is about half of what Dear Leader promised, and that gap is more than enough to point to serious architectural changes on the FP unit.

SemiAccurate has heard two things about Kepler, the first is that the chip is heavily skewed towards HPC/compute at a commensurate areal cost to graphics. The other bit is power management is still nowhere near what AMD had for Evergreen/HD5xxx, much less Northern Islands/HD6xxx. Coupled with the earlier bounding boxes, we can safely say that the Kepler chips will very likely have full rate DP performance, coupled with less than a 50% increase in shader counts, and a clock in the neighborhood that Fermi is now. Each shader will likely do full rate DP as well.

The last thing to note is the massive difference between Tesla branded Fermis and GeForce branded Fermis. If Nvidia is basing the 2.5x difference on the hugely downgraded Tesla variants, then the above numbers should be adjusted to compensate for that. Then again, if that is the case, pity Nvidia’s prospects in the professional market. Short story, the most likely Kepler candidate will have about a 50% higher shader count, full rate DP, and similar clocks to Fermi.

The last two paragraphs of the story are quite telling also, they start blaming TSMC for any lateness before the first chip ships. Apple, AMD and others have no problem at 28nm, so why does Nvidia? Not wanting to beat a dead horse, but the fault is not likely with TSMC this time. Again. The foundry did have problems with 40nm, but those were solved for everyone, but Nvidia still seemed to struggle long after. Reports about Tegra 3’s yields, or lack thereof, tell a very direct tale about Nvidia’s current process engineering.

These last two paragraphs are starting to lay the foundation for finger pointing at TSMC long before the chip has been shown off publicly. You really have to ask yourself why that is necessary six months or more before launch?S|A

Update: There has been a follow up article on Fudzilla, it can be found here. It says that Q4/2011 is not going to happen for Kepler, Q1 it is. TSMC is blamed for the delay as was expected, and AMD is said to have the same problems. SemiAccurate’s sources say that AMD is not having anything more than the usual new process headaches, and nothing like the 40nm problems from 2009.

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate