Intel’s Broadwell-E should not have been released

Computex 2016: Minor gains, major spin, and no real tech

4th Generation Intel« CoreÖ i7 Processor BadgeAt Computex Intel launched Broadwell-E to no fanfare for a good reason. This is a chip that has SemiAccurate reaching for reasons to justify its existence but Intel had to stretch far further.

As you might have figured out from the name, Broadwell-E is the consumer version of the excellent Broadwell-EP/EX chips. The silicon changes are zero, it is the same part with some functionality fused off but as you will see, what makes a world-class server part does not translate into a passable consumer chip. What little enthusiasm we had for the part was shattered by Intel’s messaging around the part, they flat-out should not have launched this part.

What is Broadwell-E? It is the smallest of the three Broadwell-EP dies called LCC, MCC and HCC are the medium and large variants. It measures 16.2×15.2mm (246.24mm^) with a total of 3.2B transistors. Please note that Intel would not disclose these numbers or that Broadwell-E was the LCC die saying, “The die size and transistor density are proprietary to Intel and we are no longer disclosing that detail.” Intel management seems determined to make the company irrelevant to consumers and this new attitude is a wonderful start. In any case this information was already public and released by Intel which makes their stance all the more curious.

Broadwell-E lineup and specs

If you look closely, you will see changes

As you can see there are now four BDW-E SKUs up from the three in the previous few generations. The 6xxx naming is an unfortunate carry-over from their past marketing games where they would increase the model number of the -E parts to differentiate them from their consumer die based brethren. These games have now caught up to Intel and Broadwell based chips are in the same ‘family’ as Sky consumer parts. The two share nothing and the Skylake based 6xxx parts are much better suited to the task at hand, gaming. Why? Lets start with a look at the previous Haswell-E parts.

Haswell-E specs and lineup

Haswell-E is almost the same

What does BDW-E bring to the table over HSW-E? How about a massive 100MHz clock boost, yes you read that right, 100 freaking MHz worth of pure upside!!! Yes if you abandon your expensive Haswell-E and spend nearly $600 more, you too can burn through games at 2.857% faster (base) or 2.703% with best case turbo conditions! Wow, if that isn’t worth $412-1569, I don’t know what is. Now you see why see why SemiAccurate is calling Broadwell-E a joke for consumers?

Lets take a look at the four SKUs in a bit more detail though. The lowest end part is a 6-core device that is artificially crippled by removing 12 PCIe lanes. This is important because it precludes the one actual benefit a -E part has over its consumer die brethren, two GPU support. If you want to do 2x PCIe3 16, the Intel -E lines are the only realistic way to accomplish that goal.

The $412 6800K exists for one reason, to make the whole line look less expensive. Intel was very direct in messaging that BDW-E 6800K starts close to where the 6700K is priced, doesn’t that sound better than the 6850K being almost twice the price of the 6700K? That said both realistically support one GPU and the Skylake cores on the 6700K are measurably better than the Broadwell ones in the -E parts. And they have a 400MHz higher base and turbo clock. Did we mention the overwhelming majority of games have an indivisible thread that is clock bound but almost none ever use a full four cores much less 8 threads? For the non-technical out there, this is the long way of saying 6/12 cores/threads is idiotic for gaming. Luckily BDW-E offers up to 10/20 for a mere $1599.

So the Skylake based 6700K is cheaper, faster, and better in every regard but PCIe lanes than the lowest end 6800K. What about the more expensive devices? The 6-core 6850K is the sweet spot for the line, it is ‘reasonably’ priced, for some definitions of reasonable, at $587 and clocked at 3.6/3.8GHz base/turbo. It offers the full 40 PCIe lanes so you can have two 16x GPUs and a couple of NVMe 4x drives for good measure. From there the core/thread counts go up to 8/16 and 10/20 but the clocks decrease steadily. Since Intel chooses to fuse off L3 cache with cores, the cache does go up but not enough to paper over the lower clocks.

Since gaming performance is almost linearly tied to clock rate, the 8 and 10 core BDW-Es are going to be worse for their intended market than the 6-core non-crippled parts. Luckily for no one at all, the slower worse parts are significantly more expensive at an eye-watering $999 and $1569. If you crave 5 or 10MB more L3 and are willing to pay double or triple the price of a 6850K, these are your next chips. For those who want gaming performance, the only SKU in this new quartet worth considering is the 6850K, all the others are either crippled or slower. Better yet buy a 6700K and a single card dual GPU. Or wait for the big AMD Zen but I can’t say why yet, heh, Zepplin.

Luckily for the Intel marketeers, there is a solution to this clock speed mess, having a vastly cheaper part that is better in every way then your flagship simply won’t do. To rectify this, Intel invented, “Intel Turbo Boost Max Technology 3.0”, something that is called out separately from Turbo 2.0. Why? Because it isn’t real, it is a complete marketing sham that does next to nothing for the user.

Intel promised the moon for the new Turbo Max 3.0 and then didn’t go into how it works. SemiAccurate asked about how it worked and was given a useless quote from a technical guide Intel sent to others but not SemiAccurate. We asked, “The new turbo, when it ID’s the fastest core, how does it assign critical processes to that core? Software? Firm/hardware? Is it Windows only or OS agnostic?”

The reply was:

“Intel Turbo Boost Max Technology 3.0 works by using a driver coupled with information stored in the CPU to identify and direct workloads to the fastest core on the die first, allowing workloads to run more quickly and efficiently. The driver also allows for custom configuration via a whitelist that gives end users the ability to set priority to preferred applications. The driver MUST be present on the system and configured correctly in order for this benefit to be realized as current operating systems cannot effectively route workloads to ordered cores.

People can then use Intel Turbo Boost Max Technology 3.0 Application to further optimize. I’ve attached an addendum that further explains the options once the application is running.”(Note: Intel didn’t actually attach anything to the email.)

Since Intel declined to answer the questions we asked, SemiAccurate dug in. Max 3.0 is a driver that only works with Windows, it isn’t actually a hardware or board feature like Intel wants you to believe. It only works with specific boards, specific firmware builds, and with specific drivers, none of which are available as a single package. Good luck to the average user trying to get it all to work, and don’t expect it to be supported once the launch party is over.

Should it work you are given a text box which you can input a list of programs that get priority. Intel says the program, sorry, feature in marketspeak, will identify the highest clocking core and effectively pin the hardest working thread to it for maximum performance and boost. How Intel identifies which thread should be pinned would be interesting to hear about but Intel refuses to actually talk tech so we can’t explain it. As far as we can tell you can only list the process which may or may not have the intended results. In any case if you don’t the ‘technology’ will pin the task in focus automatically.

Intel claims that it is, “more than 15% faster” based on SPECInt_base2006 running on a 6950X vs a 5960X. We feel obliged to point out two things here clocks and marketing. Since SPECInt_base2006 is a single threaded test and both listed cores clock to 3.5GHz, that number should reflect the IPC differences between the Haswell and Broadwell cores. That difference is maybe 5%, 15% is right out. If you read the fine print on the slide, you will see that Intel recompiled the binaries so that 15% difference that they are attributing to Max 3.0 is likely 2/3 due to compilers, 1/3 due to core microarchitecture, and 0/3 due to Turbo Max 3.0.

Secondly we will point out that the relevance of SPECInt_base2006 to gaming is a little less than zero, Intel didn’t use a game to promote this ‘feature’ because it wouldn’t show any gains in the real world. They had to resort to an out of date, deprecated server core bench to show any gains, how ethical. Intel of old would never have resorted to such underhanded shenanigans. Max 3.0 is a piece of software that could be ported to any Intel CPU, past or present, and would so the same ‘good’ there too, namely nothing. It is as much of a ‘valuable feature’ as vendor bloatwarae on a new PC is.

Luckily Intel has a justification as to why 10 slower cores are better than 6 faster ones when less than four are actually needed. That justification? Mega-Tasking! Yes Intel made up a new word to promote a less useful product that costs 3x as much as a better suited one. You know all those gamers who want to game, render 4K movies without hardware acceleration on the CPU in the background, stream their videos with CPU encoding, and do 17 other things at once, the 10-core 6950X is for you. You can Mega-Task!!!! Granted your actual game will run slower even if none of these background tasks take resources it needs, but for some reason this is a good thing. It was painful to hear Intel pitch this but you knew they had no other way to justify the 8 and 10 core parts. I honestly felt bad for the product spokespeople on the call.

Broadwell-E performance claims

They don’t want you to read the fine print

More hilarity on this front came from a slide on the raw multi-threaded performance of the new 6950X using Cinebench R15 to showcase the raw grunt. Intel claims up to 35% better performance versus the previous generation and up to 2x better performance than 4-cores. Lets take a look at these claims in more detail, it shows how a desperate Intel can bend the press into repeating borderline irresponsible claims. Please email the author with examples of sites that repeated their bullet points if you find them.

First comes the 35% number, 25% of which comes from two more cores, Cinebench is trivially threaded. Throw in another 5% or so for HSW->BDW core IPC improvements and you have maybe 5% better performance. This could be easily explained by slightly higher, DDR4-2400 vs DDR4-2133, memory clocks. In short if you can utilize 10/20 cores/threads full-time, the 6950X is the part for you. Actually a dual socket BDW-EP Xeon is a much better buy, but that would ruin the messaging.

Then we have the “up to 2x” better than 4-cores when compared against the aforementioned Skylake based 6700K. Note that 10 cores is 2.5x more than 4 cores meaning the Skylake core is notably better per core than the Broadwell ones. If your game is bound by a single indivisible thread like 99.9+% of the modern games are, you would be better off with a Skylake even if you don’t consider the $1200 price difference. Multi-threaded performance is not a really good proof point for BDW-E now is it?

Intel lists a bunch of similar numbers which show that the new BDW-E parts are at best a tepid gain over the existing parts and will often lose to them in the real world. This conclusion takes a little thinking to understand, you have to do the math rather than take the skewed bullet points Intel calls out in big yellow numbers. Intel is desperate to show that BDW-E is a step up but they fail in every way, the part is borderline useless for the markets they claim it excels at.

Mega-Tasking on Broadwell-E claims

Something here doesn’t hold water

One claim that bothers SemiAccurate is that the new BDW-E is “up to 25% faster vs. previous gen” at Twitch + 4K gaming + encode. Since this encoding and streaming is the sole domain of GPU acceleration, something the 6700K has but BDW-E lacks, we asked how this technically can be. Here is our question and Intel’s answer.

S|A You claim BDW-E is better for livestreaming, why is a very expensive CPU better for this than a cheap GPU with H.265 encode? What am I missing here? New hardware?

Intel: It’s the combination of workloads and scenarios that gamers and content creators are looking to do at the same time. This is where we talked about the concept of mega-tasking or the experience of simultaneous, compute-intensive, multi-threaded workloads as part of their typical PC usage.

For gamers, they’re gaming in 4k, livestreaming, encoding their gameplay to the system for editing and uploading later, and communicating with their eSports teams all at the same time.

For content creators, they’re editing video footage, doing image retouching, creating and rendering graphics, downloading footage from cameras, and working on soundtracks.

The apps are all active and working simultaneously and require high performance systems to satisfy enthusiasts.

Yes they didn’t answer the question because their claim is completely without merit. Encoding and streaming on a CPU will take any of the four new BDW-Es to their knees and still not deliver adequate throughput, two added cores or not. This job is unquestionably the domain of the GPU and it’s hardware accelerated encoders. Intel dodged the question because they simply could not justify their claims, two more cores do nothing to speed up the encode scenario they offered, period.

One glaring lack of data is gaming for which there were two numbers, Firestrike and The Division. Firestrike scores 30% better than a 5960X because it is effectively a physics benchmark. Fair enough but see the comments above about 25% more cores and architecture. Firestrike is a best case scenario that is not a game benchmark, it is a compute benchmark. Similarly with The Division which is a game, Intel claims, “>85 Frames Per Second vs. 4 Cores”. If this doesn’t parse for you, that is correct, it is a nonsense number, Intel doesn’t list the competition nor what it scores. All you can tell is that a 6950X scored >85FPS. Any guesses as to why they didn’t put in a single actual game benchmark in their briefing?

All this nonsense aside there are actually three new features for overclockers. Those are per core overclocking, AVX ratio offset, and VccU voltage control. The first and the last are self-explanatory but the AVX ratio offset is not. The BDW-EP/EX parts have an AVX clock that lowers the frequency of the core when AVX instructions are detected to prevent overtaxing of power delivery circuitry. This is a good thing and for servers it means higher performance all around without compromising reliability.

On the consumer cores it means, well we are not sure so we asked. Instead of a technical answer we once again got marketing BS that barely parses in English. While we suspect it is just a new name for the server AVX frequency drop, we can’t say for sure. Here is our question and Intel’s answer.

S|A: Can you go more into the AVX offset ratio? Is this the same as the AVX clocks on HSW and BDW-EP?

Intel: This feature allows operating the processor at different turbo frequencies (ratio) based on the workloads. SSE based applications can be enabled to run at higher frequencies by allowing AVX2 instructions to run at lower frequencies. Benefits include more stable overclocking and higher Over clocking frequencies for SSE based applications.

So in the end, other than marketing doublespeak, what does Intel’s new Broadwell-E family bring to the table? <3% clock gains mainly, the rest is either spin so powerful that it will be studied by physicists for years to come or flat-out nonsense. The 10-core version is utterly pointless, the 8-core is a regression from the non-crippled 6-core, and all are less suited to their main tasks than the Skylake based 6700K. If you have any Sandy Bridge based -E part or newer, there is absolutely no reason for you to upgrade. While there is an off chance that a minor feature or two is lurking under the marketing spin, Intel’s refusal to promote their own leadership features leaves us unable to recommend these new CPUs for any reason.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate