INTEL’S SANDY BRIDGE is days away from launch, but the big news this week is the massive changes to the next generation Ivy Bridge. If Intel manages to get their drivers in order, it could destroy the entire low end GPU market.
OK, lets be realistic, Intel won’t get their drivers in order, five years in they are still irrevocably broken, but Ivy Bridge will change the game on the hardware side. Ivy will give a massive speed boost to the GPU side of things through the most unlikely of places, chip packaging technology. The big secret for Ivy Bridge is a silicon interposer coupled with lots and lots of DRAM under the thermal cap.
How much memory? Rumors from the far east say that the high end SKUs might carry as much as 1GB of GPU memory on board, not very fast, but it will probably be hugely wide. The magic of this all is in the silicon interposer(SI), and stacked memory. When AMD finds out, they are going to be very scared, so don’t tell them.
What is an SI, and what is stacked memory? Stacked memory is basically what it sounds like, take a bunch of memory dies and stack them one on top of each other. If you own a smartphone or a high capacity micro-SD card, you almost assuredly are the proud owner of at least one memory stack. The technology to make them isn’t all that tricky, but powering them and cooling them is, which is why the technology is currently limited to low power devices.
“But Ivy Bridge is a 100W or so part, hardly low power” I hear you say. Yes, but the memory on it is going to be low power, and thus low speed, which is why it has to be very very wide. It is also why Intel is going the SI route instead of stacking the memory on the CPU like cell phones do.
A SI is basically a very simple chip, a slab of silicon ground very thin, and with a small number of metal layers, possibly only one. The idea is that if you put two chips on an MCM, the thinnest you can draw traces on the fiberglass or organic substrate is fat and bloated compared to those on a chip. That is fine with one die because the purpose of the substrate is basically to break out the pins so the motherboard or socket can be make cheaper, or in some cases made at all.
If you are connecting two chips with lots of external connections, like a combo CPU/GPU to a wide memory stack, and you still have to connect it to the outside world as well, pin counts explode. This in turn explodes the trace count that the substrate needs to route, and that means more layers.
More layers means more cost and a lot higher defect rate. Basically costs balloon enough so that it is uneconomical. This is the reason that all GPUs use external DRAMs and the GT200 was EOL’d so early, basically cost. You can do wide paths between chips, it just isn’t worth it.
That is where the SI comes in, instead of drawing traces on a substrate scale, you draw them on a chip metal layer scale, something that can be an order of magnitude or more thinner. Then routing between adjacent chips becomes far less of an issue, and you can pack them together at much tighter pitches.
Drawn by drunken monkeys in beginner art class
Two separate stacks solves most of the heat dissipation problem, from a packaging standpoint anyway, and the interposer solves the MCM path problem. The only remaining calculation is, “Is the cost of the SI worth more than the cost of doing it the ‘old way’?”. For Intel, the answer on Ivy Bridge is yes.
From here however, things get a little hazier. How much memory? Our sources say that if the stacking tech works out, the high end SKUs could get up to 1GB of memory under the heat spreader. This will undoubtedly be confined to high end laptop SKUs, most won’t get anything near that, if they get any at all. Stacking of the DRAM allows Intel to seriously differentiate between lines, in a way that is much more than the current bulls^h^h^h^h^h fluff, ehem, bull-fluff.
An iSomethingmeaningless would have no memory, an iSomethingmeaningless+2 would have 128MB, iSomethingmeaningless+4 could have 512MB, and finally the iSomethingmeaningless+4 XXXLOLBBQ edition might have 1GB. If the yields don’t get to within acceptable ranges over the next 9-12 months, Intel could just forget about it entirely, and use the thermal budget of the DRAM for higher clocks. It’s not like people would be surprised if that generation of GPU underperforms too.
What kind of memory will they use? All signs point to LPDDR2. Why use something that old and slow? Because it is the one type of memory available in stacks, there is no other choice. Intel could be defining their own standard, but checks with the memory makers seem to indicate that this is not in progress, much less low volume test runs. If they aren’t running chips by now, it is too late for Ivy. There are other, definitely faster, specs wending their way through JEDEC, but they won’t be here in time.
Remember when we said that a SI would allow for very wide memory busses? How wide will Ivy Bridge be? Sources again say that 512 bits are the likely candidate, again because there is a commercially available stack of LPDDR2 in that width. If it only runs at 1066MHz, that is still the equivalent to the bandwidth available for an AMD Radeon 5770. For an integrated part, 1GB of frame buffer at that speed is well into the overkill category.
In the end, Ivy Bridge will uncork the Intel GPU in a way that basically no one is expecting. The architecture may suck, the drivers may be broken, but bandwidth will not be a problem for a long time. If everything works out, Ivy Bridge will be a massively fast graphics chip, Krishna will be hard pressed to keep up. Intel is flexing their chip packaging muscles, and the payoff is going to shock people.S|A
Editor’s note: If you are new to S|A and a financial industry representative you might find the information here useful.
Latest posts by Charlie Demerjian (see all)
- You can now buy bare Snapdragon SoCs with 410E and 600E - Sep 28, 2016
- Spin Transfer Technologies talks about their ST-MRAMs - Sep 27, 2016
- ARM adds CMN-600 interconnect and DMC-620 memory controllers - Sep 27, 2016
- Globalfoundries 7nm process isn’t even close to the name - Sep 26, 2016
- ARM upgrades realtime offerings to v8-R and adds Cortex-R52 - Sep 21, 2016