When you follow a company for long enough, you can read their messaging like a book and Intel is scared silly. What is scaring them? FPGAs. Why? Because they could lose most of their data center margins because of those chips.
It all started out with an Oracle press release about the new Exadata X4-8 appliance with a custom Xeon E7v2/Ivy-EX CPU, one made specifically for Oracle. This already impressive chip has a new feature enabled, dynamic core count scaling so it can now scale frequency, voltage, and cores on the fly. This will allow high clocks for serial portions of code and then a shift to lower clocks and higher core counts for more parallel code. In short it allows the CPU to maximize performance on different problem types dynamically on the same CPU.
Technically speaking this isn’t a really big trick because much of the functionality was already in place. The new features are mainly a firmware update that allows cores to be fully turned off during operation instead of just put into sleep mode likely using already existing communications pathways and RAS functions. It can also have serious effects on frequency scaling for the remaining cores which is the main point of the changes. It is a win/win that should have come a long time ago.
The real problem with dynamic core scaling is twofold, the OS and when to make the changes. On the OS side there is almost no chance of anyone using Windows for a Xeon-EX system, contrary to PR statements you would have to be borderline insane to deploy Microsoft product in a mission critical environment. While technically Windows does support Xeon-EX RAS features like hot plug memory, do you want to risk a mission critical process on that? If so we wish you well at your next place of employment. Dynamic core scaling will fall under the same heading, it may technically work but no sane admin would risk it, much less recommend it in a Microsoft environment.
That brings us to the other half of the issue, when to do it. Oracle has control of the whole stack from Solaris to the application itself so it can make sure things work right, something Microsoft hasn’t figured out in 20+ years of claiming success. It can also allow the apps to pass hints down to the OS and even the hardware and firmware too. In short the entire stack can signal its need top to bottom and back.
This is why Oracle can have custom firmware to allow core count scaling successfully, basically they can actually take advantage of it. Apps running on Windows will most likely crash long before this becomes an issue, not that they could signal the necessary changes if they wanted if they manged to stay up. Generic *NIXs don’t have the uptime problem but there aren’t really widely adopted methods to pass the signals which are applicable to the wide variety of potential uses. Generic models don’t play well with this type of feature.
Since Oracle has everything in hand top to bottom and they know the exact workloads, or in this case more likely the specific workload that will be run on the box. Because of this whole stack view they can do things others can’t. Oracle knows when an app will need narrow and fast vs wide and slow on the specific hardware involved and how that interacts with their OS in ways others can’t. That allows them to use features found in Ivy-EX for things like RAS to scale core counts dynamically. If a system can disable a core on the fly for RAS, it is 90% of the way to dynamic core count scaling.
What is left is implementing bi-directional messaging at the app level, and then allow the system to turn cores back on. It goes without saying that while this is quite technically possible, to get a benefit from the changes you need to have a very precise view of what software needs. Technically speaking that is the big advance and it is likely almost all Oracle’s doing.
Back to the point of this story, the dynamic core count scaling is quite clever hack that rides mostly on existing hardware features. That is the good part, then in the middle of their release, Intel compared it to the vaporware FPGA bearing Xeon debacle from a few weeks ago. Intel has had a tendency to beat dead horses in front of the press to hopefully create common knowledge if not the truth. You might recall the theme from ultrabooks, graphics capabilities, Atom, tablet suitability, and margin promises on certain lines. In short, repeat and have it echo in the hopes that Wall St doesn’t notice the underlying technical problems. If they notice, they might look into them.
Intel is scared silly of the FPGA in the data center. If you don’t understand why, read the story linked above. If you don’t think it is real, look at the second picture. If you still don’t think it is a big deal for Intel, multiply the socket counts by the price of a Xeon-EP, you get big numbers fast. Then realize that Bing is the pipsqueak in this market. What is applicable to them is applicable to the others which are many times Bing’s size. That brings things from painfully large numbers to borderline unfathomable ones.
This is why Intel is desperate to appear relevant in the space, anything less may lead to Wall St looking closely at the underlying numbers we discussed earlier. That would be very bad for Intel stock, very bad indeed. Unfortunately their response is vapor. Worse yet it is vapor which just does not address the actual problem, it makes it worse. Since the company is unable to address the issue, they are trying to re-write the common wisdom in the hopes of deflecting any in-depth analysis. It didn’t work the last several times and it won’t work this time either, the only difference here is the stakes are much much higher in the datacenter game.S|A
Have you signed up for our newsletter yet?
Did you know that you can access all our past subscription-only articles with a simple Student Membership for 100 USD per year? If you want in-depth analysis and exclusive exclusives, we don’t make the news, we just report it so there is no guarantee when exclusives are added to the Professional level but that’s where you’ll find the deep dive analysis.
Latest posts by Charlie Demerjian (see all)
- More on Intel’s 10nm process problems - Sep 17, 2018
- Intel puts out another 14nm 2020 server platform - Sep 11, 2018
- Why Can’t Intel Supply Enough 14nm Xeons? - Sep 10, 2018
- Intel can’t supply 14nm Xeons, HPE directly recommends AMD Epyc - Sep 7, 2018
- AMD reintroduces the Athlon name with two CPUs - Sep 6, 2018