AMD was talking about their new TrueAudio tech in the R7 260X and R9 290/290X GPUs but they skipped why it is so important. Like Mantle before it, TrueAudio is a stunningly important technology for gaming and quite the differentiator for AMD.
The idea behind TrueAudio is simple enough, take a sound card and slap it on the GPU in a similar fashion to video decoders a decade or so ago. Graphics plus sound, what’s the big deal? Guaranteed performance, locality, features that just weren’t possible before, and compatibility. These are listed in ascending order of importance but more on that after we take a look at the TrueAudio tech itself.
The TrueAudio block itself is based on a Tensilica HiFi-EP audio DSP cores and talks to the main GPU indirectly. Since AMD didn’t ever brief us on the CI/Bonaire architecture and we are not allowed to talk about the 290/290X/Hawaii architecture yet, all we can say is that the ACP is connected to the GPU memory controller. We have no idea what the ACP is though, nor were we able to get that question answered by AMD at time of publication.
There are two takeaways here, one is that there is very low latency access between the GPU and the sound card, something that we will explain the extreme importance of later. Second is that the TrueAudio bearing GPUs, CI and VI families, can have graphics memory that is coherent with system memory. While this is a really good thing, the TrueAudio memory is not coherent with GPU memory and therefore not coherent with main system memory. We would guess that in the next generation or two of the tech coherency or at least a common memory space will be there.
If you want to do something with the TrueAudio DSP hardware, it looks like you set things up with a DMA transfer then let it rip. We say looks like because there is a block in the TrueAudio unit called Streaming DMA Engine that sounds like it would do this but we were unable to get any details on the function or tech from AMD so consider this an educated guess. The units labeled Address Translation that are paired with each DSP core also speak to separate memory spaces between the GPU and TrueAudio systems. It all looks like this.
The blocks with some explanation
The heavy lifting behind TrueAudio is done by the Tensilica HiFi-EP DSP cores, an audio oriented license-able core. Tensilica makes really useful cores that are found in just about every electronic device from routers to TVs. The HiFi-EP is just one of the flavors that as the name implies is heavily geared towards audio processing. For raw math power this DSP has some pretty hefty MAC units, up to 32×24 plus a whole lot of other goodies. For supported formats and codecs the list is well beyond comprehensive. In short this little DSP can do it all as far as audio goes. The only question we have is how many there are in a TrueAudio unit, and is this a set figure? We could not get these questions answered by AMD.
Each one of these DSPs are connected to the aforementioned Address Translation unit, it looks like one separate unit per DSP core. This suggests that the memory connected to the core, 32K I$, 32K D$, and 8K scratch RAM is in its own separate memory space as well. These translation units are situated in a block called Bus Routing and Bridging along with several other functional blocks.
These additional units are called Routing and Arbitration, Bus Bridging, and TrueAudio registers. Other than the obvious bits their names imply, what do they do? Beats us, we couldn’t get enough of a technical brief to say more than they are red rectangles inside a big purple rectangle, but they are well laid out in the diagram.
Similar to this is the third column in the block, most of the blocks are fairly self-explanatory and we can provide a bit of color on some of them. The 384K of shared RAM on the TrueAudio block is arranged in 8K blocks that are arbitrated independently. This means that the memory can be DMA’d to and concurrently accessed by each DSP as well. The text seems to imply that it is (n+1)-ported memory where n is the DSP block count but we can’t actually say if this is correct or if AMD meant DMA and one DSP concurrently. In any case that 8K block size seems to match up nicely with the scratch memory size of the DSP units but that could be coincidence too.
The Streaming DMA engine is said to be “Multichannel with programmable descriptors, IOC” and a Scatter Gather architecture. Multichannel and programmable are kind of a must if you want to support multiple streams. Scatter/Gather is the same if you want to do it with any semblance of efficiency for memory and I/O. Unfortunately that is about all we can say here, there is probably a lot of neat tech here but we have no way of finding it out from AMD.
Last up is the Bus Interface, long be its name. It is of course low latency because you really need that to do lots of audio streams with positional accuracy like TrueAudio is capable of. It can carve out a 64MB window in the frame buffer for its needs, more than enough for audio purposes. The slides say it is also windowed into the AMD TrueAudio system memory space but we have no clue what that actually is. We did ask AMD though.
So what do you get out of this whole TrueAudio block? At the moment AMD says that they can handle up to 100 mono MP3 streams or much more interestingly 30 OGG Vobis streams, but it is very early in the driver and software development lifecycle. Expect this number to grow a lot from the mere 100 streams TrueAudio can deliver at the moment.
What makes AMD’s solution stand out however is the way it is implemented. First TrueAudio takes effectively no CPU resources to run, it is set up by the CPU and then does all the heavy lifting in hardware. Traditional PC games are made to run on a wide variety of CPUs with similarly wide varieties of GPU and sound hardware resources available. Sound processing gets the leftover cycles, if any, from the rest of the game. If you dedicate resources to sound you run the risk of dropping frames, do the opposite and you get lousy or inconsistent audio. With a dedicated audio DSP like in TrueAudio a game dev can get hundreds of 3D positional streams for effectively zero CPU overhead.
Did you catch the part about 3D positional audio? That is the key feature that differentiates TrueAudio from a mere sound card be it top of the line 3D capable or awful integrated misery. Why? If you want to do real 3D positional audio in a game you have to make the scene/world geometry, send that to the CPU to set up a stream, modify the sound data and send the result off to the sound card to turn in to audible things like this.
It works but each PCIe bus traversal adds latency, the 3D positioning takes massive CPU power, and you get all sorts of synchronization headaches. That is if you can actually pull it off, if your engine and middleware doesn’t do the geometry processing for sound modifications you have a lot of code to write. Lets just say in the best of times it isn’t exactly easy or fun to do, and even if you do it right there is no guarantee that the hardware that runs your game has enough power to process it without clicks, pops, lag, or dropped frames.
This is where TrueAudio shines. The DSPs can DMA the 3D data and all other needed info directly from the GPU’s memory without any CPU intervention. The sounds can be transformed and positioned by the DSP itself, it has more than enough raw horsepower to do that. Remember the Tensilica HiFi-EP DSPs are a DSP at heart and they live to do this kind of stuff.
In short you can have hundreds of 3D positional audio streams all transformed, tweaked, and modified to the developer’s mad desires for essentially free. Better yet the resources to do this are realtime, consistent, and in theory all the same across any system with TrueAudio. Maybe. Probably anyway, at least we hope. This is a such a massive step up from even the best PC audio hardware it is almost indescribable, like Mantle it is really a sea change once again.
In the early part of this article we said TrueAudio sported guaranteed performance, locality, features that just weren’t possible before, and compatibility in increasing order of importance. You get the guaranteed performance from dedicated DSPs, locality from being on the GPU and having DMA access to its memory, but what about previously impossible features and compatibility? And then there is the part about the XBox One and Playstation 4 tie-ins, that is where TrueAudio really shines. Lets take a look at these because they are the killer apps for AMD.
Note: The following is for professional and student level subscribers.
Disclosures: Charlie Demerjian and Stone Arch Networking Services, Inc. have no consulting relationships, investment relationships, or hold any investment positions with any of the companies mentioned in this report.
Latest posts by Charlie Demerjian (see all)
- ARM upgrades realtime offerings to v8-R and adds Cortex-R52 - Sep 21, 2016
- Everspin and Globalfoundries team up for embedded ST-MRAM - Sep 15, 2016
- Intel’s Xpoint is pretty much broken - Sep 12, 2016
- ARM adds 2048-bit vectors to v8A with SVE - Sep 7, 2016
- AMD releases Bristol Ridge 7th Generation APU - Sep 5, 2016