What Is Marvell’s Custom HBM?

Meet the new family of memory devices

Marvell LogoMarvell recently announced Custom HBM memory or CHBM, but what is it? SemiAccurate dug in a bit and found out it isn’t one thing but a family of products.

You may be familiar with HBM or High Bandwidth Memory, first used over a decade ago on AMD GPUs and Xilinx FPGAs. It got to the bandwidth goals by using 1024 data lines running at sane speeds rather than the normal clocking to the moon. This saved some power but necessitated interposers or other silicon interconnects who’s expense relegated the tech to certain niches. That said when it was used, it worked quite well.

Another problem HBM had was capacity. It was soldered down and there was only one stack per ‘channel’. Until the current HBM3, capacity was limited to single digit GB per stack and now is only slightly above that. HBM4 ups the channel count to 2048 and capacity goes up a little too but the fundamental problem is DRAM density.

HBM achieves the capacity they do by stacking the DRAM die on top of each other. This is costly, has yield problems, and makes engineers twitch when you mention ‘known good die’. These are solvable problems but they cost and lower yield. Actually capacity isn’t really solvable in this way, it is more of a semiconductor/memory process issue that is somewhat tangential to the HBM stacking spec. Why? The footprint of an HBM die is fixed so your DRAM can only be so big.

So they just add more stacks right? Yes to a point that is the solution but that brings us to three more problems, beachfront area, stack height, and memory controller area. Stacks only go so high before they become economically unfeasible. You need a controller for each stack so more stacks mean a non-trivial area adder that could otherwise be put to productive use or lower cost with a smaller die.

That leaves beachfront area which in many ways is the killer problem. 1024 or 2048 lines means a lot of pins and a lot of traces. These traces need to be really short for power and latency reasons so their pins have to be at the edge of the die. There is only so much edge space to go around and the HBM spec has one side of the memory stack completely taken up by I/O pins. The host die is in the same boat, memory pins have to line the sides.

Basically your stack count is limited by how long the edge of your die is. If you want more, well tough luck, make a bigger die or use a different tech. Since most HBM bearing devices are relatively monstrous beasts that push the reticle limit anyway, bigger isn’t usually an option. This means with a reticle limit of ~800mm^2, you are limited to 3, possibly 4 HBM stacks per side of the die. If you don’t need other I/O, you might be able to add another to the shorter side but since no one in the industry seems to be doing that, there are probably other reasons lurking in the background.

Marvell Custom HBM with potential benefits

Marvell Custom HBM With Benefits

So what is Marvell doing? Well their Custom HBM is just that, custom rather than a standard. HBM4 doubles pin count, ups the clock, and adds capacity as expected but also does the unexpected by adding a base logic layer to the stack. Anyone recall Micron’s Hybrid Memory Cube or Intel’s “not” HMC variant for the early Larrabee style products? Nothing new under the sun in the industry as it turns out.

This logic layer allows all sorts of tricks to happen to the HBM stack, but for HBM4 it addresses exactly none of the problems listed above. Beachfront area is fixed, stack size is fixed, area is fixed, pin layout is prescribed, and the rest. You could argue that some of the area from the memory controller on the main die can be shifted to the logic layer but that is pretty minor. On the up side you get a diverse supplier base, economics of scale, and known design specs.

Back to the point of the story, Marvell and CHBM. As the name suggests, it is custom, specifically a custom logic layer. What does this do? Want higher speed? More dense pins? Serial high speed I/O instead of wide parallel? Wide serial? Hello Kitty approved traces? Different sized footprint for the stack? Two memory die stacks on a CHBM ‘chip’? You can do any or all of this now because it is custom.

Therein lies the problem, you are going from a standard off the shelf product with all of the benefits and handcuffs to a custom device that has all of zero off the shelf anything backing it up. It is bespoke and not volume. Anyone going down this path had better know what they are doing, have the design chops to pull it off, have the volume to make the crushing expense less so, and have a product that can bear the costs. Then you need to convince a bunch of memory companies who are commodity players with a legendary risk aversion to make it. Simple enough right?

At their analyst day, Marvell had Samsung, Hynix, and due to quiet periods not Micron on stage, but Micron does support the effort. So all three major memory players are all on board with CHBM which means many of the potential issues just went away. So far so good. Then again if you don’t have volume, fixed costs will kill your design. To SemiAccurate this is the only real potential problem.

If Marvell lays out a bunch of IP options that a company can choose from for the base die, and potentially supplies the controller IP on the die too, many of these issues can go away. Instead of a fully custom HBM stack, it is a semi-custom one. This will get you most or all of the benefits and reduce most of the pain points.

So with CHBM, Marvell isn’t proposing a single tweaked HBM device, it is a way of proposing a customizable family of products. If they have a design that is on the shelf but not a JEDEC standard, that may work. If not, you can do whatever you want if you want to pay for it. In theory CHBM can address stack count, beachfront area, footprint, and all the rest. Since it is proposed for the post-HBM4e time frame, nothing is imminent but work is likely going on right now for the first devices.

Is this a fracturing of the industry, a hobbling of JEDEC? Not so much, it is just a realization that for many devices, AI SoCs really, the handcuffs of standards are slowing things down. That sector seems to have money to burn so, well, they are trying to burn less with CHBM. And it is a good plan, if done right, and Marvell is doing right so far, it should be a win/win. Look at Apple and their M1 chip, it used non-JEDEC standard LPDDR5 to great effect, CHBM will likely do the same.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate