A lot of people are in a tizzy because AMD (NYSE:AMD) has changed the upcoming Seoul CPU from 10 to 8 cores. The general responses ranges from AMD incompetence to apocalypse, but all it really signals is a lack of technical understanding on their behalf.
The slide in question was the server roadmap we wrote up here. It introduces Piledriver cored Abu Dhabi and Seoul chips, successors to the Bulldozer based Interlagos and Valencia respectively. The base part has 4 modules/8 cores, and the bigger variant is two of those in a package. The big controversy is that they were supposed to be 5 module/10 core parts.
Instead of thinking about why, or horror of horrors, doing their purported job and asking AMD why this change happened, most leapt to the conclusion that the sky had fallen. We don’t think majority of the ‘journalists’ covering this ‘controversy’ actually read the roadmaps, much less compared them to the older versions. If they did, it is pretty obvious that they didn’t understand the tech behind the change, because the reason the cores were dropped is right there on the roadmap.
What do we mean by this? Well, Seoul and Abu Dhabi not only lost two cores per die, they also lost the new socket. Both G34 and C32 live on for another generation, and that in and of itself is not a bad thing. If you stop and think about it, when does AMD change server sockets? Only when they change memory types. G34 and C32 use DDR3, the last server socket, F, used DDR2, and the one before it DDR. There have been minor updates along the way, but fundamentally, there isn’t anything other than memory that causes a socket change. There might be other changes, but they wait for the memory induced change.
Where are we going with this? Well, the new socket was meant to use DDR4. Unfortunately, DDR4 is MIA, and for the remainder of 2012, there won’t be anything close to a standards compliant DDR4 chip. Why? Because there is currently no DDR4 standard, it is being evolved, and anything called DDR4 in a press release is nothing more than a low volume test part to enable that evolution. As things stand, even 2013 is optimistic for consumer parts on the market, much less affordable ones in quantity.
No DDR4 standard, much less DDR4 chips on the market, means nothing to plug in to DDR4 DIMM slots. Bringing out a platform that necessitates DDR4 is not all that bright a marketing move, although there are many worse examples out there. What do you do if you are AMD? You put the new core on the socket that does have memory in existence, in this case DDR3 and G34/C32. So far, so well thought out and logical.
Why did the core count drop? Easy. If the Piledriver core un-borks the Bulldozer core, and early Trinity numbers are encouraging in this regard, IPC goes up, throughput goes up, and in general, more gets done. This is both per core and per socket. That in turn puts more demand on, wait for it, the memory subsystem. As the saying goes, you have to feed the beast. With the same sized spoon as the last non-un-borked cores. See the problem?
If there is more pressure on the already overstressed memory subsystem, and you are making changes at the core level to add to the pain, you are only going to make that bottleneck worse. Adding two more cores will most likely mean things get slower per core, and they are idle more than architects tend to be comfortable with.
You can, and AMD does, alleviate this problem with other changes and optimisations, but fundamentally, two more cores probably wouldn’t do squat all for performance. Yes, you could show a gain in many synthetic workloads, possibly some real ones too, but many things that require bandwidth would be just as likely to get worse. So two more cores burn power, waste die space and may well drop performance in the areas that AMD is actually selling decently.
The solution? If you rule out magic DDR4 fairies, you are left with losing the two added cores. Since the process remains the same, 32nm SOI, you can go three ways. If your design is pad bound, you can use the freed up space for doodling, family photos, or more cache. Leaving it blank is so last year, so that probably won’t happen.
If your design is not pad bound, you can always make the die smaller, saving cost, power, and upping yield. The last option is borderline cheating though, basically don’t actually do anything, just leave all 5 modules in place. This option means yields out of the gate are pretty good, a single defect in a core is still a top bin part, and two core killing defects is now saleable instead of scrap. If a company is even more creative, you could test each module for power draw, and fuse off the highest one, possibly shaving a few watts off the TDP.
One thing that is not a sane plan would be to add memory channels. This would both necessitate a new socket, thus new boards, and have a one generation lifespan. On top of this, it would massively up package costs, complicate board routing, and in general add far far more to the cost than it is worth. The enterprise RFQ writer’s term for this type of part is, “avoid”. Same for a one year G34/C32 socket with PCIe3 pinouts, nice idea but a pox on sales.
Why the cores were removed is pretty simply, because the couldn’t be fed. Until DDR4 comes out, it looks like AMD is stuck with G34 and C32. This limits what they can feed to the chip going in to that socket, and therefore the total MIPS that can be sustained on it. The only sane option left is to drop the last two cores. How it is accomplished is a topic for debate, why it was is not.S|A
Author’s note: Anyone got some insight in to this in relation to Romley/Sandy-Ex and Ivy evolution or time to market? Any official statements lurking on the web?
Author’s note 2: Thanks to Chris Angelini for being an excellent devil’s advocate about the technology in this article.
Latest posts by Charlie Demerjian (see all)
- Toshiba shows off UHS-II bus SD cards - Dec 17, 2014
- Marvell introduces their 88PA6120 3D printer SoC and HDK - Dec 15, 2014
- AMD cuts FirePro prices and talks up quality - Dec 10, 2014
- STMicro shows off the first ARM M7 core - Dec 8, 2014
- Qualcomm shows off the MARE parallel API and runtime - Dec 4, 2014