3M, Intel, and SGI today showed SemiAccurate a new immersion cooling system running Novec fluids to greatly increase rack cooling efficiency. If you thought a water cooled GPU was nifty, you haven’t seen anything yet.
Take a look at the enclosure below, if SGI and 3M are successful this version of the ICE X supercomputer/shared memory system will be the new standard rack. Actually that is a bit of a stretch, the enclosure you see below is a concept enclosure to demonstrate what can be done with a new generation of liquid cooling. SGI contributed the system, a 72-core Romley/Sandy-EP system with Infiniband for interconnects, 3M added their latest generation Novec FlouroKetone fluid and expertise in using it, and Intel helped out with the silicon side of things.
This is what the Rack-Of-The_Future(R)(TM)(C)(?) may look like
Lets start out with the fluid in the heart of the system, there are about 770kg of 3M Novec FlouroKetones bathing all of the electronics, direct immersion. The lower portion of the enclosure is basically just a giant bathtub filled with Novec, the electronics and power supplies are standard air-cooled SGI parts simply dunked in. There is no pump, no power used, and the waste thermal energy from the CPUs does all the work.
The enclosure at work, bubble, bubble, toil, and no trouble
This demo enclosure is only half populated, the part closest to the bottom is effectively an empty rack. The top half starts out with the Infiniband switches fully immersed with the black cables running to them. Behind that under the boiling liquid are the CPUs, in this case 72 Intel Xeon E5-2690s in 2S SGI system boards. Each socket has one DIMM per channel or four DIMMs per socket, 8 per system, 288 in total.
Behind those under the silver water cooling loops are the PSUs, once again standard 277v air-cooled AC parts from the SGI catalog. They too are fully immersed. To be direct, on the system side there wasn’t much done to the boards, the corporate triumvirate simply took off the shelf parts and dunked them in Novec with only the most minor changes.
This is about as complex as the changes get
The most obvious change to any of the parts is the heatsink on the CPUs you see above. Instead of the standard SGI water block on their dense rack ICE X systems, the entire block and water loops were replaced with the copper spreader you see above. It is just a simple copper plate with bumps covered in sintered copper grains like the inside of a heatpipe or vapor chamber. An entire complex water loop was replaced with a small copper slug mainly to increase the surface area.
Each 2S system has some pretty complex layouts with one socket on a board, two boards plugged in to a backplane. The two boards are close enough together to interdigitate the DIMMs. (Authors note: Oh go look it up…) This is why the air-cooled version needed water blocks, even if air cooling could remove that much heat, there wasn’t room for a large enough heat sink between the boards. If that is impressive, the density seen here is just a start using off the shelf parts, if there were bespoke server boards you could pack much more in to the Tub-O-Novec, cat hair free (TM).
How does it all work? Easy enough, the hot bits like the CPUs and DIMMs are well above the 49C boiling point of Novec. It boils the fluid so you get phase change heat removal rather than just fluid temperature deltas. This is exactly how a vapor chamber works but on a vastly larger scale. Since Novec is a fairly high density fluid and the vapor is much denser than air, it all stays in the tub without any complex retention schemes.
Note the vapor line on the wall next to the S|A logo
You can see where the vapor line ends up, the ‘wet’ line is clearly visible on the enclosure wall. The cooling loops basically define where the line ends up, if they are working right the vapor should never go above them. The hot vapor is moved off the hot silicon via normally wasted heat energy. The loops cool things down and the Novec just drips back to the pool. All you need is a simple divider with a slit at the bottom and the boiling fluid along with gravity will suck coolant back through to keep the CPUs immersed. No external energy inputs are needed to drive the system, there is nothing mechanical to break, and no large volume of air to bring dust in, it just works on its own.
The 3M Novec here is non-toxic, has very low greenhouse gas potential, and obviously is a dielectric. If you are familiar with similar but older fluids like Flourinert, you can think of Novec as a much newer, safer, and better generation of it. If you put something electronic in it, it will not short, not cause problems, and is very unlikely to react with anything you dunk because of how inert it is. One demo at 3M yesterday was to write on a Post-it note with a Sharpie, dunk it in Novec, and pull it out. The ink didn’t run and the paper was almost fully dry on removal, bone dry seconds later. Several people dunked their cell phones in the fluid without problems.
You are probably wondering what happens if you dunk your hand in the hot bath. I did. Actually before you touch the surface, once you go below the vapor line things are instantly 49c and you can feel the density difference. This is hot enough to notice but not enough to cause damage without prolonged exposure but you do know the vapor is there immediately. The Novec vapor condenses on your hand and feels a bit oily as you would expect a hydrophobic liquid to feel. Within seconds it starts streaming off your fingers and your hand is bone dry almost as soon as you pull it above the vapor line. Dunking your hand isn’t really much different, just more Novec-y.
So we have a zero energy input cooling loop that has no moving parts and uses a very inert fluid. Actually the Novec in question is also used as a replacement for Halon in fire suppression systems, 3M had a NASCAR car they sponsor in the lobby that used it in the extinguisher. While no one involved is claiming any direct benefits from this, don’t expect many flash fire problems from anything immersed in this enclosure.
How much heat can this demo rack remove? The system shown to SemiAccurate was dissipating 12KW at the time and it is only about half populated. SGI, 3M, and Intel claim that it is rated at 80KW or about 20x what the current 3M data center racks are specced for. 3M is claiming about 25KW heat removal per liter of Novec but that is only for these demo systems.
If you take things a lot farther like Allied Control did in Hong Kong you can go much higher. How much higher? Unlike the SGI enclosure the Allied Control version is in a standard 19″ rack form factor. With it they are able to cool 225KW in one rack, for the math averse this is a tad more than the 4KW 3M gets and the 2.5KW most data centers are willing to support. With a 24 rack data center, Allied Control is claiming to replace well over 100 standard racks with high density systems. They have a video of the system here and for those who understand the terminology, a PUE of 1.01 is incredible, a PUE of 1.01 in Hong Kong is more incredible-r.
So what do you end up with? Using 3M Novec, SGI, Intel, and Allied Control were able to make ultra-efficient, high density, even higher energy density systems with no moving parts, no energy input other than the facility water loop, and nothing dangerous or toxic. And all of this is with effectively zero changes to current SGI boards, with custom designed boards you could get densities 3x higher as Allied Control did. Everything is hot plug, just as usable as a normal system, and there are no real down sides we can see.
On the weight front a Novec bath enclosure like the one pictured above weighs almost exactly as much as a normal rack, the water loops, massive copper heatsinks, fans, and everything else weigh quite a bit. Because of the form factor the load per square foot of floor space goes way down versus a single rack, and way way down compared to the footprint of an equivalent amount of compute in standard racks.
Then there is the energy savings. Most data center enthusiasts will tell you that a good data center will use about half again the energy used by for compute to cool things down. Other than the facility cold water loop, the SGI and Allied Control systems use no energy. No fans to spin, no fans to break, no dust, no nothing, and all the Novec can be reused, it has an effectively infinite service life. Efficient, simple, net cheaper TCO, and no real down sides, what more could you ask for?S|A
[Editor’s note: We still recommend you check the MSDS before you bathe your cat in this.]
Have you signed up for our newsletter yet?
Did you know that you can access all our past subscription-only articles with a simple Student Membership for 100 USD per year? If you want in-depth analysis and exclusive exclusives, we don’t make the news, we just report it so there is no guarantee when exclusives are added to the Professional level but that’s where you’ll find the deep dive analysis.
Latest posts by Charlie Demerjian (see all)
- More on Intel’s 10nm process problems - Sep 17, 2018
- Intel puts out another 14nm 2020 server platform - Sep 11, 2018
- Why Can’t Intel Supply Enough 14nm Xeons? - Sep 10, 2018
- Intel can’t supply 14nm Xeons, HPE directly recommends AMD Epyc - Sep 7, 2018
- AMD reintroduces the Athlon name with two CPUs - Sep 6, 2018