Haswell upgrades focus on power management

Graphics gets the headlines, power management makes it all happen

Intel - logoIntel has finally released the next big thing regeneratein CPUs but what does Haswell bring to the table? Depending on how you look at things either a lot or not much, but it certainly has enough interesting details to keep any geek happy.

The basics of Haswell don’t look all that different to the last two CPUs Intel put out, Sandy Bridge and Ivy Bridge. Both have four cores max on the desktop, HT, a central ring bus and, a GPU on die. From a high level view there isn’t much there but the devil is in the details and there are a lot of details. The cores are entirely new but not radically different from their predecessors other than adding AVX2 support.

The first really new bit is that Haswell comes in three flavors, GT1, GT2 and GT3 whereas the last two only came in two variants, GT1 and GT2. With 10 and 20 shaders, the lower two Haswells represent incremental gains over the 6 and 16 in Ivy especially since they use a variant of Ivy’s architecture.

Intel Haswell GT3 die marked

The die looks like this without the markings

GT3 is where things start getting interesting and it comes in two sub-flavors. To be a bit more precise GT3 only has one silicon variant, the second version called GT3e has a 128MB eDRAM cache on the package acting as an L4 cache for the entire chip. All GT3s have 40 shaders, enough raw performance to get Intel out of the graphics doghouse. Performance won’t stun a gamer but it should be more than enough to drive a high rez screen at reasonable refresh rates, a feat that Intel could not claim before today.

While the GPU is an evolution of the older Ivy shaders, there are serious enhancements to functionality. Intel added support for DX11.1, OpenCL 1.2, and OpenGL 4.0 to the feature list along with 4K video support and three screen capabilities via DP1.2. On the video encode/decode side Intel has added MJPEG to the codec lists and now supports more streams. The idea here is to support better video conferencing where simultaneous encode and decode are mandatory and additional streams can be put to good use too.

Intel Haswell packaging options

One package or two good sir?

Stepping outside the silicon for a moment there are two main types of Haswell, one package and two package versions. The difference between the two is how the chipset is connected to the die, is it on the same PCB or in separate package. As we said in our earlier look at the SKUs, PGA package parts are all two package along with GT3e graphics versions, BGA can be either one based on TDP. Low TDPs have one package, high TDPs two packages. This would appear to be related to Ultrabooks but GT3es are aimed at that market and use two packages likely because of the higher TDP. There are one package GT3s however so it seems to be an arbitrary line drawn by marketers not techs.

As SemiAccurate first reported over a year ago, Intel is putting some memory on the Haswell package. There were a few details SemiAccurate got wrong back then most notably the size which we reported as 1GB instead of 1Gb aka 128MB. Additionally we said it was likely interposer based but this too was incorrect, instead of a wide slow interface on silicon, Crystalwell is a narrow fast interface on a PCB.

The memory itself is quite interesting though, it acts as an L4 cache for the entire CPU not just the GPU. 128MB is large enough to be very useful in most circumstances, the architects told SemiAccurate that modeling indicated 32MB was sufficient so they doubled it and doubled it again just to make sure it was future proof. We like overkill and this appears to be sufficient overkill.

Intel Haswell with Crystalwell block diagram

Blocky memory diagram for Haswell

One of the biggest mysteries was who is supplying this eDRAM to Intel and what configuration it is using. The answer there is easy, Intel is back in the DRAM business and making this die on a 22nm process in Oregon. The main reasons to make the memory in-house are power, size, and customization. The 22nm process has very low leakage, ideal for memory in many ways but also very costly for the size of the part Intel needs. This is somewhat counterbalanced by the process geometry, but the memory ends up being about 1/3rd the size of the full 177mm^2 GT3 die and pulls between .5-1W idle, 4W TDP. This more than likely necessitates the second chipset package for thermal reasons alone.

The eDRAM itself is quite unique, Intel would not go in to specifics about the connection to the CPU other than being a custom narrow interface running at very high speeds. As a cache it can support more than 50GBps bi-directionally, 100GBps+ in total with lookups running in parallel to the LLC (L3 in this case). Internally the eDRAM is highly banked as you would expect from something that is used as an L4 cache and the hit rate is said to rarely drop below 95%. Not bad. In lower power states the L4 can be completely powered down transparently to the CPU. It effectively supplements main memory bandwidth as far as the CPU is concerned.

One of the biggest changes to Haswell is around power; be it use, savings, or delivery. Unfortunately most of the really cool things Intel did are not just under the hood but will never be called out, they just work transparently. CPU power management is an old trick and things like power gating and sleep states are reaching the point of diminishing returns. Like Silvermont the biggest bangs in this arena are mostly off-die with one big exception, voltage regulators.

Voltage regulators (VR) are an extremely important part of a system that is never mentioned by enthusiasts. The more voltage planes a CPU or SoC has, the more VRs it needs. Want independent voltages to each core, just multiply the core VRs by four to start with. Where you put them, how fast they can switch between levels, and cooling them are not trivial issues. If you look at a modern CPU package you can see the components on the PCB, multiplying that count gets to be quite a tricky packaging problem. Intel solved this by putting the VRs on die, or at least mostly on die.

Intel voltage planes and FiVR diagram

There is only one FiVR

This tech is called FiVR and it incorporates all of the VRs save the inductors on the Haswell die itself. This not only solves the packaging problem but also some current delivery and switching time problems, both critically important to a modern system. FiVR can deliver much higher current and has a 125MHz frequency, 5-10x faster than off-die components. This gives Haswell much cleaner and more stable power by narrowing variance among other thing, but also lets it wake up much faster. Power is saved through increased efficiency and the ability to get to sleep faster and more often than before. FiVR is a clear win for Intel on the CPU side, and simplifies board design too.

Next up is something that Intel calls Power Optimizer (PO) and it is unlikely to have an immediate effect on the Haswell user. Intel describes PO as deterministic and bounded power management, it essentially means Haswell is aware of the devices connected to it and what their capabilities are. It would know for example that a USB device may need Xµs to wake up, a PCIe device Yµs, and a drive s.

Intel Haswell Power Optimizer timings

Does anyone else think PO has some bad overtones?

This allows the system to both keep things asleep until the last µs and wake it up in time for the next event. If the system is in a sleep state and knows it will get woken up after a certain time, it can both wake up slower devices early and faster devices later saving power. Better yet everything will be ready in time for whatever requires them, the guesswork is removed from the process.

In engineering terms it puts hard bounds on whatever safety margins used to be there for sleep states and maximizes power down time. It pays dividends now but won’t  really come in to play until the device ecosystem learns how to communicate their needs properly back to the CPU. In a few years, PO will be a big deal but for now it is just a good thing.

The next new power feature is called SDP and while it is an interesting feature, how Intel goes about marketing it is downright dishonest. If you recall back to CES where the “7W” Ivy Bridge was announced, Intel called it 7W and only briefly mentioned that it was SDP not TDP. TDP was not mentioned anywhere mainly because it wasn’t a 7W part, it was a 17W part that Intel invented a new naming scheme for. SDP also wasn’t defined in any way and the tame press just wrote it up as 7W as Intel intended. The truth lost, dishonesty won.

SDP is effectively power capping, it takes a higher TDP part, 15W in the case of Haswell, and puts a power cap on it effectively limiting it to whatever wattage is needed for the design. The real difference between SDP and TDP is that TDP is the maximum heat a device can dissipate for a sustained period of time. You could look at TDP as a rating for the capacity of the heatsink Intel wants an OEM to design. Turbo can exceed TDP for limited amounts of time but the average heat generated by the turbo device will still average under TDP.

SDP also works like turbo in a way that it can user higher frequencies defined by the silicon’s TDP for short periods of time but the sustained thermal load is far below what the TDP for the chip supports. This allows a system designer to specify a SDP for a device that is lower than the CPU TDP and be confident that over thermally significant periods of time the chip will not put out more than the limit. For short periods of time, it can use the full frequency of a higher TDP part, but not for long.

Out problem with SDP is that Intel is deliberately using it to confuse customer by claiming wattages that are half or less what the chip actually delivers. The OEM will cap the chip at xW while the marketing people claim the full frequency of the un-capped part knowing that it can’t be sustained. It is similar to companies that claim turbo frequencies at the chip speed but this time Intel is effectively claiming both turbo frequency as the main frequency and the capped wattage as the real wattage. Doing both deliberately is completely unethical but that has never stopped Intel in the past. Just think of SDP as an underclocked CPU that costs more.

Another power savings technology in Haswell is called panel self-refresh and it works just like it sounds. At times when the screen is not changing, that would be the majority of the time that a PC is on, the screen effectively manages itself. It has a little bit of memory on board and when nothing changes, instead of sending all the video data over the bus every 1/60th of a second or so, the panel just reads from its own memory to regenerate the image. The video bus and the GPU can go to sleep saving power, it just needs to signal the panel when there is a change. If you think about how many µs it takes to put Haswell to sleep and compare it to how long it takes to press a key or move a mouse, you can get an idea of the potential savings here. While Intel is very late to this game, the inclusion of panel self-refresh in Haswell is a good thing.

In the end Haswell is quite an evolutionary device for a brand new core. We would love to tell you about that architecture but Intel has never briefed us on it so we don’t have a clue what is in there. Other than the voltage regulators, everything is about what you would expect, a mildly warmed over Ivy Bridge. The packaging is quite new however as is the power management of the system as a whole. Those two things will bring pretty serious improvements to the end-user but are hard to quantify individually. No big bangs but no down sides either, that is Haswell.S|A

Have you signed up for our newsletter yet?

Did you know that you can access all our past subscription-only articles with a simple Student Membership for 100 USD per year?   If you want in-depth analysis and exclusive exclusives, we don’t make the news, we just report it so there is no guarantee when exclusives are added to the Professional level but that’s where you’ll find the deep dive analysis.

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate