Meet Larrabee, Intel’s answer to a GPU

Reprinted Articles

Intel LogoEditors Note: Over the next few weeks we’ll be publishing the Bumpgate Series with some additional commentary, updates and information.  We are reprinting some of the often referenced articles that originally appeared on the Inquirer. Some will have added content, but all will be re-edited from the originals as per contractual obligations. You may see some slight differences between the two versions.

This article has had some of the original links removed, and was published on Monday, February 23, 2007 at 03:48AM.

WE FIRST TOLD you about Intel’s GPU plans last spring, and the name, Larrabee last summer. That brings up the question of just what the heck it is, other than the utter death of Nvidia.

Intel started talking about Larrabee last week with VR-Zone (nice catch guys), (Update May 17, 2009 – Link fixed, same story, new URL) so I guess that makes it open season on info. VRZ got it almost dead on, the target is 16 cores in the early 2009 time frame, but that is not a fixed number. Due to the architecture, that can go down in an ATI x900/x600/x300 fashion, maybe 16/8/4 cores respectively, but technically speaking it can also go up by quite a bit.

What are those cores? They are not GPUs, they are x86 ‘mini-cores’, basically small dumb in-order cores with a staggeringly short pipeline. They have 4 threads per core, so a total of 64 threads per “CGPU”. To make this work as a GPU, you need instructions, vector instructions, so there is a wide vector unit strapped on to it. The instruction set, an x86 extension for those paying attention, will have a lot of the functionality of a GPU.

This results in a ton (non-metric) of threads running a super-wide vector unit with the controls in x86. You use the same tools to program the GPU as you do the CPU, using the same mnemonics, and the same everything. It also makes things a snap to use the GPU as an extension to the main CPU.

Rather than designing the traditional 3D pipeline of putting points in space, connecting them, painting the resultant triangles, and then twiddling them simply faster, Intel is throwing that out the window. Instead you get the tools to do things any way you want, if you can build a better mousetrap, you’re welcome to do so. Intel will support you there.

Those are the cores, but how are they connected? That one is easy, a large, wide bi-directional ring bus. Think 4 not 3 digits of bit-width and Tbps not Gbps of bandwidth. It should be ‘enough’ for the average user, if you need more, well now is the time to contact your friendly Intel exec and ask.

As you can see, the architecture is stupidly scalable, if you want more CPUs, just plop them on. If you want less, delete nodes, not a big deal. That is why I said 16 but it could change on more or less on a whim. Scalability is limited by the bandwidth usage.  20 and 24 core variants seem realistic.

The current chip is 65nm and was set for first silicon in late 07 last I heard, but this was undoubtedly delayed when the project was moved from late 08 to 09. The above specs are for a test chip, if you see a production part, it will almost assuredly be on 45nm. The one that is being worked on now is a test chip, but if it works out spectacularly, it could be made into a production piece. What would have been a hot and slow single threaded CPU is an average GPU nowadays.

Why bring up CPUs? When I first heard about Larrabee years ago, it was undecided where the thing would slot in, CPU or GPU. It could have gone the way of Keifer/Kevet, or been promoted to full CPU status. There was a lot of risk in putting out an insanely fast CPU that can’t do a single thread at speed to save it’s life.

The solution would be to plop a Merom or two in the middle, but seeing as the chip was already too hot and big, that isn’t going to happen (heh heh, must bite tongue), so instead a GPU was born. I would think that the whole GPU notion is going away soon as the whole concept gets pulled on die, or more likely adapted as tiles on a fusion like architecture.

In any case, the whole idea of a GPU as a separate chip is a thing of the past. The first step is a GPU on a CPU like AMD’s Fusion, but this is transitional. Both sides will pull the functionality into the core itself, and GPUs will cease to be. Now do you see why Nvidia is dead?

So, in two years, the first steps to GPUs going away will hit the market. From there, it is a matter of shrinking and adding features, but there is no turning back. Welcome the CGPU. Now do you understand why get why AMD had to buy ATI to survive?S|A

Updated May 17, 2010: This is what I describe in the new story from today, Larrabee 2 is now Larrabee 3, as Larrabee 0. It never appears to have taped out. The chip itself didn’t work all that well, in iteration 0 but the ring bus went spectacularly well. Spiritual derivatives now form the heart of Nehalem EX (AKA Becton) and soon Sandy Bridge.

The part I was laughing about above was that people told me there was more than a passing interest in asymetric multi-cores for a while at Intel, but that trend seems to have cooled. A lot. Suddenly. The closest to that was Haswell, and we know what happened there. I am not sure what happened to kill that trend at Intel, but something sure did. -Charlie

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate