CXL Consortium board members
Lets start out with the politics of CXL or more importantly the seeming lack of politics. The board of the consortium is made up of 14 companies, 6 chipmakers, 2 OEMs, and 8 others. It is the chipmakers, or at least four of them, that are the interesting bit. As the CXL spokesperson pointed out, this is the first time all of the major CPU designers, Intel, AMD, ARM, and IBM, are cooperating on an interconnect.
That is important. In the past one or two players would dump something on the industry and then try to force it in to mass adoption, usually accompanied by some underhanded tricks to gain competitive advantages. Some may argue that the initial CXL spec had some of this when Intel put it out there, basically a foil to CCIX, but those accusations seem to have gone by the wayside.
ARM for example will use CCIX for chiplet and socket to socket comms and CXL for CPU to device signaling. To oversimplify, CCIX is more useful in multi-master situations and CXL is aimed at single master topologies. As both specs move forward, CXL will subsume most of CCIX’s functionality and any differences will mostly fade. At the moment everything looks happy and decisions appear to be made for engineering reasons not marketing, lets hope that never changes.
The three major CXL protocols
As a quick recap there are three current CXL protocols, CXL.IO, CXL.cache, and CXL.memory which all do somewhat different things. CXL.IO is the base spec and it is compatible with PCIe5, in fact you can interleave the two protocols on a per-packet basis. CXL.cache adds coherence with the cache of an accelerator and CXL.memory is the protocol for coherence with external pools of memory. To these basic specs, CXL 2.0 added a few goodies.
CXL 2.0 Switching
The first of these is the long awaited switching protocol which does exactly what it sounds like. Unlike CCIX which can do some pretty mind-bending topologies, CXL keeps it simple. For latency reasons a CXL 2.0 switch can only be one layer deep, no cascading here. If you think about it when you are doing highly latency sensitive things like cache coherency, variable latencies are not your friend. Usually the simpler something is, the faster it is and CXL 2.0 errs on the side of speed and consistency for good reason.
CXL 2.0 Pooling with multiple domains
Pooling is where things get interesting, and again it does what it sounds like. Each device can have up to 16 logical coherence domains enforced by hardware. Those domains can be controlled by hosts of enormous complexity but there can only be 16 hosts seen by a device. This isn’t as big a problem as it seems because the host itself has a memory controller by definition and that can subdivide the device coherence domain on it’s own like it does with it’s own memory and devices without the switch being aware. That said the switch will see it as one domain and pass the requested data, simple is lower latency and consistent.
The last bit isn’t really a CXL feature as much as it is a PCIe5 feature. Since CXL is built on PCIe5 technologies it can inherit the goodies it needs and in this case they are using PCIe5’s end to end encryption. This means all those external accelerators, memory pools, and even potentially persistent memory/NVRAM will have their security concerns, at least those on one side of the controller, taken care of by the spec.
To wrap it all up we have a singing and dancing CXL Consortium board that is doing the right thing for the right reasons and thinking about hugging in a socially distanced world. Switching adds a lot of flexibility to the protocol and with the right fabric manager, you can do all sorts of tricks with pooling. Now that the CXL 2.0 spec is done, the device makers can make a new class of toys for us to play with, and that is the important bit.S|A