TODAY, AMD IS launching two new cards from it’s Northern Islands family of GPUs, the HD6850 and HD6870. These cards, based on the ‘Barts’ chip are both mid-range players that will redefine mainstream performance and devastate Nvidia’s margins.
First of all, what is Barts, and what is Northern Islands (NI)? NI is the new family of GPUs consisting of Turks, Caicos, Barts, Cayman, and the dual Cayman called Antilles. Barts is the mid-range player that sits slightly above where Juniper, aka 5770, now lies. Both Northern Islands and the 5000 series Evergreen parts are all built on TSMC’s 40nm process.
The raw numbers on Barts
HD6870 is the bigger part of the two, the main difference is that it has 1120 shaders while the HD6850 has ‘only’ 960. The 5870 that this card replaces has 1600 shaders, and the 5850 has 1440, but the performance of the 6870 is slightly above the much higher shader count 5850. Why? Details, more details, and some magic speed powder.
6870 specs at long last
The first detail is the most obvious, clock speed. The 6870 runs at 900MHz as opposed to the 850MHz of the 5870 and 725MHz of the 5850. This alone accounts for a lot of the speed difference. The 6850 runs at a ‘mere’ 775MHz. High clock speeds usually end up taking a toll on power, and we are happy to say that ATI took this into account and did a good job lowering power use.
The TDP of the HD6870 is exactly the same as the 5850, 151W, but idle power is down from 27W to 19W. 6850 only pulls down 127W TDP, and only needs one six pin power connector instead of the 6870’s two connectors. The 6870 should overclock quite well with that power headroom. The 5870 sucks down 188W TDP, but it is also quite a bit more powerful. In any case, power use was lowered quite a bit in Barts.
Update: Fixed two wrong part numbers/typos in the above paragraph.
That is extremely impressive considering that the Barts GPU has almost everything that the 5850 has, other than the SIMD unit count. Barts has 14 SIMDs, Cypress/HD5870 has 20, 18 in 5850 guise, but Barts is only 76% of the size of it’s older cousin. Barts weighs in at 255mm^2 and 1.7 billion transistors versus 334mm^2 and 2.15 billion for Cypress.
No changes from this angle
Some units, most notably Texture Units, went down, from 72 to 56, but that is almost entirely made up for by the raw clock difference. The same holds true for the shader counts, much of the difference in count is made up for by simply running the old ones faster. Clock covers up a lot of problems when things are bad, and puts the boot in when they are already good. Leakage on TSMC40? Not ATI’s problem.
So with everything running faster, that means tessellation runs about 25% faster too, right? Well, not really, ATI put a lot of effort into beefing up the parts around the tessellator as well. The tessellator is only one part of the tessellation chain, there are two sets of shaders, the hull shader and the domain shader that are before and after the tessellator itself.
The tessellator itself just splits up triangles into sub-triangles, the two shaders do most of the heavy lifting to make things pretty and right. Think of this chain as the shaders putting some vegetables on a cutting board in a nice fashion, then the tessellator cuts them with a cleaver, and passes them off to be artfully arranged by the other shaders.
Tessellation performance graph
For ATI, the bottleneck was in the Hull and Domain shaders, so both sets were heavily massaged and updated to improve tessellation performance to 2x that of the 5850. Although this sounds like a lot, the older chips were more than capable of putting up a triangle per pixel on any monitor in existence at any displayable frame rate, so it is basically counting how many angels can dance on the head of a pin. Either that or fighting benchmarks that are simply made to exploit architectural differences.
That brings us back to memory, both speeds and performance. Barts has 256 bit wide GDDR5 running at running at 4.2GHz effective speed in the 6870, the 6850 and 5850 both are at 4.0GHz, and the 5870 runs even faster at 4.8GHz. This puts the 6850 at 128GBps for memory bandwith and the 6870 at 134.4GBps. ATI engineers came so tantalizingly close to the magic 134.7GBps transfer rate it isn’t funny, but they couldn’t make it.
Needless to say, Cayman will blow them all out of the water soon, but Barts should be in the ‘more than enough memory’ category. Not only is it faster than the 5850, the ‘next generation’ memory controller adds a year of learning, and ups efficiency by quite a bit. One reason that Evergreen didn’t change memory widths over R770 is that the efficiency went up so much it wasn’t necessary. This time isn’t as big a leap, but it is still better.
UVD3 block diagram
Next on the list is video and hardware decode/encode. For decoding, three things were added with the upgrade from UVD2 to UVD3. The newer silicon supports MPEG-2 fully now, including entropy decode and in-loop deblocking. MPEG-2 is now end to end decoded in hardware.
To that, Blu-Ray 3D support was added in the form of the multi-view codec. Given the adoption rates of 3D by consumers, this killer app should drive tens of sales if the weather stays nice. MPEG-4 part 2 aka xVid and DivX are also hardware accellerated. All of this would be nice if ATI’s drivers supported acceleration for things other than Windows, but sadly they do not. Because of this, the UVD block is sadly nothing more than wasted silicon.
On a more positive note, there is a new anti-aliasing (AA) mode in the 6000 series cards called Morphological Anti-Aliasing (MOO). Instead of the traditional AA modes, MOO is more of a post-processing step. After a scene is rendered, MOO goes over it and looks for edges, and takes out jaggies.
MOO on a closeup light from AvP
Because of this, it should work everywhere, on any game, or even with non-game software. Even if a competitor decides to lock ATI out of AA modes and put contractual wording in their brib^h^h^h^h paperwork to prevent a company from doing the right thing, MOO should work.
ATI claims it is faster than supersampling, but that is pretty obvious, and is about as fast as CFAA, but works on all edges. Basically it is better, faster, and compatible with all DX9/10/11 games. Best of all, if your game is compute bound, the extra shaders can be put to use for MOO, something that is a clear win on higher end cards or Crossfire rigs. Could this be the killer Crossfire app?
Most are aware of the massive advances in Anisotropic Filtering (AF) that started in the 5000 series of cards. You had perfect angle independent AF, but there were some slight problems. In some circumstances, the transitions between filters within a mip map level were not smooth, and resulted in a visible banding on some tests.
While this was a vast improvement over the older cards, it wasn’t perfect, and some people were bothered by it. In NI, the transitions are now perfectly smooth, and the last vestige of AF imperfections are banished for good. Unfortunately, these details are nuanced, so pictures showing them need to be very high rez. In lieu of huge downloads, imagine one of those textured tunnel pictures with slight banding on the left side, labeled before, and none on the right labeled after. There, teh aswum, see?
Last up, we have the display controllers, all six of them. On the 5000/Evergreen line, ATI changed the game with Eyefinity, their code name for 3+ monitor support, as a standard feature. Because there were six independent display controllers, AIBs could make cards with six monitor ports for almost no cost. Some like Powercolor made even odder configurations like 5 monitor cards.
Going from two to six monitors was a huge step, but AMD has not stood still. Some of the additions are obvious, others are not. The obvious ones are Displayport 1.2 and HDMI 1.4a native in hardware. HDMI is nice but proprietary and expensive. V1.4a basically adds Blu-Ray 3D support, so all your DRM dreams will come true with full hardware accelleration. Think of it as handcuffs that go on very fast.
Displayport 1.2 on the other hand is much more useful, it lets you chain monitors, or put multiple outs on one DP cable via a hub. The hub is like a USB hub, but has video ports instead of USB. Think of it like a breakout box or laptop dock. The short story is you can get six monitors from two ports now, and that is a huge step forward.
DisplayPort 1.2 brings the octopus home
That part is really cool, but it gets better. The new display controllers are completely independent from each other. The ‘old way’, circa the dark ages (2009), forced you to have all displays be the same rez, refresh rate, and everything else. If you had two of one monitor and one of another, you had to pick which one that looked bad, or simply didn’t work.
The new controllers allow you to do whatever you want, however you want, and it just works. All screens can be run at completely different resolutions, refresh rates, orientations, and even color corrected separately. That is hugely impressive and extremely flexible, not to mention practical and useful.
Some of this is limited by Windows however, the DRM inflicted by that OS selling users out to the content MAFIAA limits what you can do, but on Linux, it should just work, ATI drivers willing. In any case, any limits are not in the hardware any more.
If this isn’t enough, the added bandwidth now allows for 7.1 channel uncompressed audio with all the Dolby and related trademarked technologies you wish to append. Basically, the 6000 series of cards does what you want it to do.
In the end, you have two cards, the HD6850 and HD6870, that are priced to kill. They have MSRPs of $179 and $239 respectively, and given that they are faster than the equivalent Nvidia GTX460 cards by tens of percent, kill they will.
The cards should be in plentiful supply by the time you read this, with the normal caveats of demand outstripping supply for most hot new products in full force. One thing you will not see is a repeat of last year’s 40nm supply problems. TSMC seems to have gotten 40nm ironed out, and ATI is more than comfortable with it’s nuances on their third generation 40nm part.
If Barts seems a bit underwhelming to you, think about this, it is the mid-range part. If ATI’s mid-range part destroys the GTX460 and threatens the GTX470 which is more than 2x it’s die area, what do you think the high end Cayman will do? That part is far from what you would expect, the fun is just beginning.S|A
Note: Due to tight time schedules, SemiAccurate will have a full set of tests on these products coming tomorrow.
Latest posts by Charlie Demerjian (see all)
- Qualcomm talks about 802.11ac MU-MIMO details - Apr 27, 2015
- Another Qualcomm server core breaks cover - Apr 24, 2015
- Ubuntu strips a phone OS to the Core - Apr 21, 2015
- What does Qualcomm’s server SoC look like - Apr 15, 2015
- How does Qualcomm’s SenseID fingerprint scanner work? - Apr 9, 2015