AMD puts massive SSDs on GPUs and calls it SSG

Siggraph 2016: Game changing performance, impossible now possible

AMD Radeon Logo 2013AMD just changed the GPU game forever with their Radeon SSG technology. If you have been waiting for a GPU with massive flash storage on board, SemiAccurate has good news for you.

It is hard to overstate what a sea change AMD’s SSG technology brings, putting large low latency storage on a GPU will bring about some amazing opportunities. More importantly things that simply could not be done on a GPU before are now not only possible but practical. For the professional space in movie rendering, previsualization, and massive, complex CAD models, persistent GPU storage really does change how things will be done.

On the technical front the idea is easy enough to explain, take a couple of M.2 PCIe3 4x NVMe SSDs and slap the onto a professional GPU. Connect them with eight PCIe lanes peeled off the main bus, and off you go. Not many more technical details were given out, nor were pictures of the card or board, but that should be enough to get us started.

In short you can now have terabytes of persistent low latency storage on your GPU, with persistent, low latency, and terabytes being the game changing parts so we will repeat them, several times. Terabytes of persistent low-latency storage on your GPU, really important. The hardware is just off the shelf M.2 SSDs so the low-level wear leveling will be handled by the hardware and users will see this first generation as just storage. If AMD doesn’t implement a flat memory model in very short order in the next generation or two, we will happily eat one. HINT.

So users now have a fairly coarse block addressable space on their professional GPUs, what’s the big deal? You don’t have to traverse the system PCie bus, the driver stack, CPU, back over the bus, and to storage to get data now so the impossible is now possible. Better yet the CPU overhead of sending data to the GPU is now gone as is the bus congestion brought on by streaming large textures from plentiful system memory to precious GPU memory. This is the roundabout technical way of saying that SemiAccuate thinks those 8x PCIe lanes peeled off for the SSDs are going to be more than made up for by the system traffic they free up. Faster, less energy, and lower latency.

But what about these benefits? The first demo AMD is said to be showing is a use case for movie editing and cleanup on the GPU. What is the issue here you may ask, this is old hat and has been done on the CPU for years. Some GPUs can even assist it without slowing things down in the process, so what does SSG add? How about 8K movie streaming and cleanup in realtime. At 96FPS. Sure you can do this with traditional methods but the best of them will run the same task at 17FPS.

AMD is happy to point out this is a 5.6x speedup or so for the cost of two consumer SSDs. Before SSG, possible but slow. After SSG, fast enough for most users. The impossible, realtime 8K cleanup, is now possible. Hugely complex and highly detailed CAD models that took the better part of an hour to load up and decompress will still take the better part of an hour to load up and decompress on an SSG GPU based system. Why bother? Because it takes the better part of an hour to load and decompress the first time, then it can stay resident on the GPU’s flash storage. The second time it should take seconds. If you look at the cost of a modern automotive or aerospace engineer’s time, SSG is a no-brainer, any CTO would be foolish not to deploy this tech ASAP, the ROI would be measured in weeks.

I could go on about the benefits but you get the idea, it changes the game in a fundamental way that few recent technologies have done. It really is that important for the professional graphics space and what starts there trickles down to the consumer space in short order. A single 4x version with 128 or 256GB of space on board is easily within the price range of high-end consumer GPUs and it would effectively wipe out most texture size limits. Streaming texture hacks, brilliant as they are, would be unnecessary, drivers simplify, CPU load goes down, usable PCIe bandwidth goes up while PCIe latency plummets because of congestions relief. Fundamentally game changing for consumers too.

So when can you get this magical elixir of low-latency, terabyte sized, persistent GPU storage? How about right now. No we are not kidding, it is available now on AMD’s web site as we speak. It is an SDK aimed at developers for the moment and costs $9999 if you qualify, but it is real. Before you cringe at the price, most of that money is probably spent on developer support, not hardware. Retail versions are promised for 2017.

What magical things can you do with your new AMD SSG equipped GPU when you get it? Nothing really, at the moment it is all demos and tools but the potential is unlimited. It really is the technology of the year and the impossible tasks made possible already are just the tip of the iceberg. SemiAccurate is not joking when we say this is a fundamental game changer for graphics, nothing like this has happened in years. For some markets it will simply be indispensable, for others it offers a step change in TPC, and all for the cost of a few wires on the PCB, a socket, and a flash-aware memory controller already on the die. If you don’t want SSG functionality, no problem, don’t buy a bundled SSD with your GPU. No SKU headaches, no manufacturing pain, just performance if you need it. Terabytes of persistent, low-latency storage does the impossible and AMD calls this near miracle SSG. If you are a user of professional graphics, within a year you won’t be able to live without it.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also a council member with Gerson Lehman Group. FullyAccurate