XBox One’s sound block is much more than audio

Hot Chips 25: Think DSP and Kinnect co-Processor more than beeps

Xbox One Logo 87x27 A deep dive in to Microsofts XBox One GPU and on die memoryThis story is the third part of SemiAccurate’s look at the XBox One’s architecture as presented at the Hot Chips 25 conference. Part 1 can be found here and Part 2 can be found here.

Sound Off Times Eight:

The last major chunk of the XBox One is the audio subsystem and the eight co-processors that make it up. This is not just a sound card with a lot of channels, while the Audio Processors block does that it also does things like positional audio and beam forming for the Kinnect. Think of this more as a DSP block than a Sound Blaster. All said, the entire unit can do 15.4 GFLOPS and a claimed 18G OPS in total including the FP and scalar functions.

Microsoft XBox One Audio block

The sound blocks and some of the connections

You might notice that on the right hand side of the diagram there is a coherent connection to the CPU and main memory. If this was a simple audio block there would be no need for CPU coherency, nor the AXI bridge that links everything together. AXI is the ARM bus so why would you put that in an audio unit? Think there is more to this than beeps now?

The yellowish stuff on the bottom is actually a traditional sound unit that should be able to do everything you need for the latest high bit rate absurd channel count audio outputs but this is a given in modern hardware. What this block says to me is that the XBO can also do some nifty in-game directional audio for those wearing headphones or with 19.5 surround setups at home, a noticeable leap from this kind of thing.

How much of the functionality is split between the simple audio cores and how much is in the units above wasn’t specified but is likely determined by the programmer. That said the coherent links from the GPU come in to the DMA engine which is specifically associates with the audio functionality. The DSP Control Core, Scalar Core and two Vector Cores all have their own discrete caches and share 64K of local SRAM cache as a kind of DSP L2 that is local to them. 2 x 128-bit / 8 = 32 bytes so if the unit can do 15.4 GFLOPS that would put the speed at a little under 500MHz if MS counts a FLOP as 8-bits wide.

This whole unit was an internal Microsoft design rather than an off the shelf IP block. Since the Kinnect uses the DSPs heavily for positioning sound this makes sense, especially considering the coherency requirements with the rest of the system components. One notable absence however is specific ties to the PCIe, coherent DMA<->GPU links, and other I/Os listed in the system diagram earlier.

While these paths are undoubtedly to keep the Kinnect latency down more clarity on it would have been nice. In any event, the whole Audio Processors unit is more of a serious DSP co-processor with capabilities to make melodic beeps rather than the other way around.

The End Result

In the end what do we have? A 5+ billion transistor SoC built on TSMC’s 28nm HP process. The 361mm^2 die size, ~381mm^2 counting scribe area, lines up nicely with similar transistor count AMD GPUs made on the same process. Unfortunately no TDP was disclosed or even hinted at. All the caches on the die total up to about 47MB so that will dominate the SoC’s area, about half the die according to the architect, much of which is distributed to the various units.

XBox One SoC picture

The XBox One SoC and package

All of the major units are coherent with each other but the GPU has a direct non-coherent path that is basically AMD’s Garlic bus to main memory as well, a must for any modern CPU+GPU/APU/SoC. (Note: There isn’t a direct equivalent to Onion in the XBO SoC, that functionality is done with Microsoft’s IP) More interestingly is that while the 32MB of onboard memory is coherent with the CPU there is not a direct link between the two. How badly the added latency, if any, will hurt CPU code using this cache is not public but should make for some interesting arguments. In any case every block is coherent with every memory space and the system MMUs can make it all look seamless even if on-die and off-die memories are in different address spaces.

The system as a whole has a massive chunk of hardware devoted to AV tasks, enough to make background encoding, decoding, playback, scaling, and the rest about as seamless as possible to the gamer. AV should just work and gaming should just work regardless of what the other is doing, something the PS4 should do as well but in a very different way. Then again the majority of this functionality is available in a <$100 Android box so it isn’t a standout feature if you follow tech at all.

Then there are the GPUs themselves, the Achilles heel of the XBox One. While there is nothing wrong with them per se, they are a slightly older revision than used in the PS4 but the differences are small enough to be ignorable. What does matter is that the PS4 has about 50% more units at roughly the same clocks, 1152 at ~800MHz vs 768 at 853MHz, a massive difference. Couple this with the vastly more user-friendly 8GB GDDR5 memory design and you have a clean kill for Sony on performance.

Microsoft made a really impressive SoC that is a multimedia monster with a bit of gaming ability, technically speaking it is quite impressive that they pulled it off. Not to take anything away from the hardware designers but Microsoft management simply aimed wrong. Sony made a gaming machine, Microsoft did not. Sony made a clean design for coders, Microsoft did not. Microsoft made a complex technical masterpiece that is in a no-mans land between a far <$100 Android media center and the PS4. Sony just did right for gamers and won the battle.S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate