Remember the days when you could buy a GPU as a discrete component because it wasn’t part of your CPU? Those days are long gone for the mainstream, and the days of using discrete parts instead of integrated functionality are about to go bye-bye as well.
Yes, Intel has finally had enough of those pesky GPU vendors showing them up, finally gotten sick of GPUs with functional drivers making their life hard, and in general just showing them up. Rather than making something that is better and winning on merit, Intel is just slamming the door on Nvidia and AMD. How they are doing it is quite interesting, how they are going to get away with it is even more so. But that is only the beginning of a very complex story.
The first step to the endgame
The beginning of the end starts with the Broadwell generation, no BGA means no desktop, so the best you are going to get is an embedded CPU on a board for all-in-one and small form factor use. In and of itself, a BGA CPU doesn’t preclude a PCIe slot or two, but, if you like PCs, Broadwell is simply not for you. Haswell gets a mild tarting up when Broadwell is released, and Intel thinks this will hold the enthusiast until Sky Lake. SemiAccurate sources say Intel may or may not release a socketed Sky Lake so ask again in a year, but don’t count on it happening.
How much bandwidth is needed?
So how is Intel going to make GPUs go away? The first step is what you might have guessed already, strangle them. GPUs need bandwidth, the higher performance they are, the more they generally need, and that need increases with every passing generation. Some GPUs are more bandwidth hungry than others of similar performance, but the most relevant metric to bandwidth use is the application being run. Some games, and most GPGPU work, thrive on bandwidth, others do not.
If you look at this TechPowerUp story, you will see that when you lower bandwidth, some games suffer badly, other’s don’t. Others sites like Tom’s Hardware come to similar conclusions as well. PCIe2 8x is probably the lowest bandwidth a modern GPU can reasonably have without noticeable performance degradation, and PCIe3 4x offers the same bandwidth, so is mostly equivalent for almost everything. Go down from there, and your experience will suffer with a modern GPU.
In the future, GPU performance will of course go up, Nvidia is promising OEMs about 80% higher performance with Maxwell than you get from today’s Kepler, so bandwidth needs are going to go up by roughly the same percentage. The TechPowerUp story used a GK104 Kepler, Toms Hardware used a GF110 Fermi that is a generation earlier, so you can get a bit of an idea where Maxwell will end up.
For a distantly related article who’s research lead to this one, SemiAccurate asked Intel, AMD, Nvidia, and Jon Peddie to give us their estimated GPU bandwidth requirements for current, 2 year old, and 2 years in the future GPUs in various scenarios. AMD and Intel declined on grounds of it being proprietary, and Nvidia didn’t bother with an excuse. The Jon didn’t get back to us in time for publication. That said, their answers probably would not have differed much from the test results above. If PCIe2 8x/PCIe3 4x is the minimum now, requirement will undoubtedly be notably higher two generations from now when Broadwell comes out. That next step up from PCIe2/8x bandwidths is PCIe2 16x/PCIe3 8x, the requirements of +2 years out GPUs are likely somewhere in between those two points, so that should be an adequate minimum for Broadwell.
The noose is tightened
Any guesses how many lanes Broadwell has? If you said 4x, you got it. The problem? According to SemiAccurate’s sources, it will only be PCIe2, not PCIe3. 20Gbps is not enough to support a modern GPU without some severe performance degradation, so how well do you think an 80% faster one will fare? Make no mistake, this will severely hamstring any mid-range or higher GPU of that generation, and that is the point.
Intel will likely defend this move on the grounds of power savings, PCIe is a power hog, and the less lanes you have, the more power you don’t burn. Even when not in use PCIe lanes do leak power, and since Broadwell is a mobile only part, that is of great concern. In this regard, Intel is completely correct to remove as many PCIe lanes as they can, but they can’t take them all because of the FTC settlement linked below, not to mention that most users need to connect some peripherals here and there. Not putting any in would both violate that settlement and make the part totally unpalatable to OEMs and laptop makers. You need to have some PCIe lanes, and 4x PCIe2 is about the lowest you can go and still have the Dells, HPs, and Lenovo’s of the world consider your parts.
One minor technical issue torpedoes this train of thought though, power savings. PCIe3 is an 8Gbps link while PCIe2 is 3Gbps, but encoding differences mean PCIe3 has twice the usable bandwidth that PCIe2 does. That clock increase ups power use a bit, but that is by far overshadowed by the power savings technologies that are present in the new spec. Without going in to the specs themselves, PCIe3 put tons of effort in to this area, and is the first of the line to take saving energy seriously. PCIe2 essentially has no power savings; PCIe3 does. That said, at high utilization, PCIe3 probably burns more power than PCIe2 for the same lane count.
With that in mind, I will let the reader decide for themselves how much the decision to put PCIe2 on Broadwell has to do with saving energy versus crippling discrete GPUs. More importantly, Sandy Bridge-E/EX and all Ivy Bridge parts have PCIe3, Haswell likely does too, it is now the industry standard. If Broadwell is going back to PCIe2 like our sources suggest, it seems to be far more effort than it would take to leave PCIe3 in there. Once again, please make up your own mind on this decision, it could potentially save a lot of transistors or some other *snicker* plausible reason.
Dapper cap you have sir!
More interesting than power is to think about PCIe 4x as a performance ceiling, basically bandwidth caps the performance that you can get out of a discrete GPU no matter how fast it is. Even if Intel goes with PCIe3 4x, anything more than a mid-range GPU will be throttled down to the performance of a mid-range GPU. The PCIe width means that if you put in a discrete GPU you won’t get acceptable performance out of it. But that ceiling will still be better than an integrated GPU, right?
Well, maybe not, Remember that Haswell GT3 has roughly 5x the shader perfomance as Sandy Bridge, and that was borderline acceptable for mainstream use. Add in Crystalwell, and you have vastly higher performance than a vanilla Haswell GT3. Throw in the power savings aspect and you have something very tempting for mobile users. Unless you price it so high that OEMs balk. Priced higher than a discrete part with vastly higher performance, Haswell GT3 + Crystalwell is essentially unpalatable to OEMs regardless of performance and power. Broadwell will improve on both the performance front, rumored to be around 40% faster there, and on the power front by going to a 14nm process. Throw in a successor to Crystalwell on lower end SKUs and you have a power savings trifecta.
Unless you once again price it at silly margins leaving discrete GPUs, and the memory they need, cheaper to add than your integrated graphics solution. In that case, OEMs have made it very clear that abusive margins are not something they want to pay, they will go discrete far more readily. The fact that it gives higher performance to the user is just icing on the cake, cheaper is what they care about. Power savings is nice, but a dollar or two of saved on the BoM side will mitigate most of that with a bigger battery.
But what if those abusive margins don’t go down for Broadwell, and they go up instead? OEMs will run to discrete GPUs far faster then right? Unless they know that adding a discrete GPU won’t get them the any more performance than the Intel solution, only higher power draw. If Intel really does their math right when picking PCIe widths and levels, they can pretty much ensure that even if you put in two of the top end 2014 GPUs, graphics performance will go down compared to the high end Broadwell.
Where do you think PCIe2 4x lands on this spectrum? Haswell GT3 performance is >5x Sandy Bridge not counting Crystallwell or potential clock speed gains. Broadwell adds 40% or so to this figure, how does that stack up to what you can drive across a narrow, wheezing PCIe link? No points for figuring this out, too easy. So what Intel is effectively saying is, “So, you think you don’t want to pay us what we want, eh? Fine, try it on Haswell, but you’ll be back for Broadwell. Like it or not, we will raise rates, and if we can’t get to to play ball voluntarily, we’ll take away the option of going elsewhere. Pay us or you will have low end graphics and high power consumption. Good luck selling that, even if your machines have a white fruity logo.” It is kind of hard to argue with their logic there, at least the technical parts.S|A
Note: This is Part 1, Part 2 will be posted tomorrow.
Latest posts by Charlie Demerjian (see all)
- ARM upgrades realtime offerings to v8-R and adds Cortex-R52 - Sep 21, 2016
- Everspin and Globalfoundries team up for embedded ST-MRAM - Sep 15, 2016
- Intel’s Xpoint is pretty much broken - Sep 12, 2016
- ARM adds 2048-bit vectors to v8A with SVE - Sep 7, 2016
- AMD releases Bristol Ridge 7th Generation APU - Sep 5, 2016