Nvidia’s five new Keplers raise a red flag

Manufacturing woes are worse than they seem

Nvidia world iconNvidia released five ‘all new’ video cards last week, and it is a cause for celebration, right? If you understand the underlying tech, you quickly realize that instead of a celebration, this is a red flag waving over their manufacturing ability, they are in serious trouble.

The news itself is fairly innocuous, there are five ‘new’ cards out now, the GT 645, the GT 640, the GT 640, the GT 640, and the GT630. No, that is not a typo, there are three GT 640s, and two of the five ‘new’ cards are actually older 40nm GPUs with a new badge. All five are OEM desktop cards, and the three ‘new’ 28nm products are all based off the same GK106/GK107 ASIC. The best chart listing the new additions is found at Anadtech here.

While this may look like good news, it is actually a pretty damning condemnation of Nvidia’s ability to manufacture anything, much less a large and complex ASIC on a bleeding edge process. The company, by their own admission, can’t get the 28nm process to work well enough to produce GPUs in acceptable volumes, a scary thought for a company that lives and dies by such products. Their 28nm manufacturing abilities are the diametric opposite of Jen-Hsun’s claim, he said, “[Our experience with 28nm] is looking really good, it is looking much better than our experience with 40nm. It is just a comprehensive, across-the-board engagement between TSMC and ourselves making sure that we are ready for production ramp when the time comes. So I feel really good about 28nm.” Bravado aside, nine months later, they are the only company using TSMC’s 28nm process that can not get financially viable yields.

This is evident in three ways; the availability of launched products, what products were launched, and direct statements from the CEO himself. Each tells a different story, but they all add up to the same problem, Nvidia’s management has some severe problems and it is destroying their ability to function on a technical level. It is not a new problem, but one that has become critical in the last few months because of an expiring agreement with TSMC.

During the last two years, the manufacturing malaise was masked on all 40nm products because TSMC allowed Nvidia to pay for only good die, effectively eating the cost of Nvidia’s design woes. “Because of our potentially limited access to wafer foundry capacity and our recent transition to a wafer buy model where the costs of our products are based on the price per wafer versus price per functional die, decreases in manufacturing yields could result in an increase in our costs and force us to allocate our available product supply among our customers. Lower than expected yields could potentially harm customer relationships, our reputation and our financial results.” That quote is from a recent 10-K filing from Nvidia, but it was fairly deeply buried several pages in. With the transition to 28nm, Nvidia will have to pay for its own mistakes, something that is looking quite unpleasant.

The first problem we mentioned is availability of the GTX 680, the flagship product in the Kepler line, based on an ASIC code named GK104. As we said months before anyone else, Kepler is the clear winner this round, assuming they could be made. It has been more than half a quarter since launch and the card is still almost completely unavailable. Shipments to major retailers are both small and sporadic, if they come at all, and seem to be more of a PR event than a product restocking. Even the halo retailers that get a disproportionate supply are out of stock, signalling essentially zero supply.

SemiAccurate’s channel checks have confirmed the anecdotal evidence that simple shopping provides; Nvidia can’t supply GK104 based GPUs. Sources tell us that AMD shipped more than 10,000 units in their initial shipment of Tahiti based GPUs, the GK104’s direct competitor, and another larger shipment followed the first. Both were before the launch of the first Tahiti card. Since then, with the exception of TSMC’s still unexplained hiccup, shipments have been common and plentiful. Stock in the channel is also plentiful, and Tahiti has never been completely out of stock even if specific models come and go.

In contrast, sources tell SemiAccurate that initial shipments of GK104s, launched more than three months after Tahiti, were unlikely to be more than 1000 units worldwide. Those same channel sources tell us that to date, volume of GK104 based Kepler cards is almost assuredly less than 10,000. At this point, TSMC’s 28nm process has had more than six months to mature, essentially 1/4 of its lifespan, and Nvidia can still not get yields up to par. Everyone else however, can. Nvidia repeatedly says that the abject unavailability of the GTX 680 is due to demand, but shipping numbers directly contradict that claim. At least that is what is being reported here, here, here, and many other places. You may recall that during the Q&A session of the last quarter’s conference call, Nvidia was rather subdued about 28nm. Jen-Hsun Huang said, “The gross margin decline is contributed almost entirely to the yields of 28-nanometer being lower than expected. And that is, I guess, unsurprising at this point.“. While the effect on gross margins may not be a surprise, the fact that Nvidia was having problems on 28nm sure was. Why? Because they are the only company having problems with 28nm. It also doesn’t play well with the rather stretched reasoning about demand vs supply as highlighted above.

The layman’s view is that everyone is having problems on 28nm, and, TSMC’s little ‘whoopsie’ in Q1 aside, it simply is not true. There is a serious shortage of 28nm wafers starts, they are on severe allocation at TSMC, but that does not say anything about yields. Supply is essentially wafer starts times yield, and both can lead to a shortage of end product, but few wafer starts or low yield alone do not necessarily mean a company will be unable to supply their demand, that depends on the demand itself. Increasing one or the other, or both, can mean greater supply.

The other two bellwether companies shipping complex 28nm products manufactured at TSMC are AMD and Qualcomm. Both of them have a similar story about their problems on 28nm product supply, wafers are in short supply, but no yield problems. AMD has three GPUs currently shipping on 28nm, all of which are available in quantity, and customer demand was met in Q1. Wafer starts limit upside, but do not directly equate to yield issues. To quote Thomas Seifert in the Q1 2012 AMD conference call, “We were able to meet customer demand in the first quarter. The products are good. The demand is strong. We would like to see more access to upside volume but we have met all — pretty much all demand in the first quarter.

Qualcomm has been echoing that sentiment in an eerily similar fashion, no 28nm yield problems, lots of upside limitation due to wafer starts. If you look, you can find even more positive statements from the FPGA vendors on 28nm, but we consider them a special case due to limited volumes and extreme defect tolerance in their products. In short, everyone is complaining loudly about wafer starts, no exceptions that we can find for large volume customers. Only one manufacturer is complaining about yield, Nvidia, and that is directly reflected in their products. Draw your own conclusion.

Last on the list is the five new cards, and there is a lot here to look at. The first thing is the three completely different GPUs all marketed as the same thing, GT 640 GPUs. To call this unethical is to give it more praise than it deserves, but we will stick to technical items for this analysis. The GT 645 is the top of the heap, and that is not a new part. Does the fact that Nvidia’s performance leader in the volume segment can not be produced on 28nm set off any alarm bells to you? How about the fact that their next model down has a 40nm part there too. Any guesses which SKUs of the three GT 640s will ship in volume? Any guesses as to how the shipments will be characterized, lumped together or broken out?

This says in no uncertain terms that Nvidia’s GPUs did not meet their performance goals, a different and unrelated problem to the functional yields problem. In technical terms, the top speed bin wasn’t yielding enough to be viable, so they had to put older parts on life support and slide them in to the current line-up. Performance aside, semiconductors are shrunk as a means to reduce cost, and the market that the GT 645/640/630 plays in is very cost sensitive. If there was any way to make the parts on 28nm, Nvidia would be doing that. They are not. Contrast that to AMD which is both making competitive parts on 28nm and supplying their customers. Nvidia has a problem here, draw your own conclusions as to why.

On the high end, the GTX 680 is a 28nm product, but it is almost completely unavailable almost two months after launch. The second product launched is usually the cut down version of the big part, in the case of the GK104, that would be the GK104-335 aka the GTX 670. If a GPU does not meet performance goals, IE the desired bin has low yields, the next one down tends to have much greater supply, usually an order of magnitude or more. For the most part, the more severe the top bin yield problems are, the greater the yields of the next bin tend to be. After the GTX 680, the GTX 670 is the next chip launched, right?

This time, you would be incorrect if you said ‘right’, Nvidia instead launched a card called the GTX 690. The problem is that the GTX 690 is two GTX 680s on a single card, and Nvidia does not have the ability to supply single units. The GTX 690 has a price tag of $999, so demand will probably be a few thousand units worldwide over the life of the product, it is a halo piece in the extreme. Once again, Nvidia can not supply them, something that is not exactly a leap of logic considering they are made from a sub-bin of GTX 680 GPUs. A percentage of zero is still zero.

That still does not explain what happened to the GTX 670s, a part that should be extremely plentiful. Demand should be notably higher than the GTX 680 because of the lower price, but supply should be proportionately higher too. But they are not available. Something is seriously wrong here, but what that is, we can’t say for sure yet. The GTX 670 is set to launch mid-May, but the delay is telling.

All of this analysis leaves a few of SemiAccurate’s previous questions answered, mainly that Nvidia simply can’t make working 28nm chips in quantity. Their claims about 28nm going well, and how different things are this time are exactly what we said they were, laughable. Those claims  from Nvidia were aimed at a specific audience that could not discern why at the time, but had the ability to affect the company.

28nm yields at Nvidia, but not elsewhere, do not seem to be financially viable at the present time. In light of the company no longer being shielded from their own management incompetence by TSMC pricing this time around, it could be borderline disastrous. The obvious fix, running more wafers, is only digging the fiscal hole deeper. Wafer shortages mean that is not possible for the moment even if Nvidia decided to burn money to meet customer demand, possibly avoiding long term reputation problems, not to mention contractual problems. Nvidia simply can not meet their customer commitments, and it is going to cost them.

Open questions remain though, the most pressing is what is going wrong with GK104s, and why hasn’t the company’s physical design woes been fixed years after they started? Whatever that problem is, it also carries over to the smaller GK106/7, which is why we say it is indicative of extreme technical mis-management. Any outward pointing fingers can be summarily dismissed due to AMD, Qualcomm, and the rest not having similar issues. Similarly, the mask problems that we hear about point the finger firmly back at Nvidia and a lack of basic internal technical controls.

Instead of a triumphal launch of five new cards, we are instead witnessing a company on the verge of imploding. One entire line can not be supplied, and the volume SKUs of the other have to be filled with older, lower margin GPUs. Instead of answers, we get vague promises of supply in coming months, but nothing concrete. Can Nvidia supply the cards just launched? No. Why would competent management not fix the systemic failures that alienate end users and OEMs in this fashion?S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate