AMD LAUNCHED IT’S first new GPU architecture in four years last week, Cayman aka the HD6900 series. Lets take a look at the cards themselves, and how they perform, with a lot of emphasis on the new PowerTune feature.
The Architecture and Cards:
We looked at the architecture in a previous article, so we won’t go over that part again. Instead, lets take a look at the cards themselves, the HD6950 and HD6970. At 389mm^2, the chips are a little bigger than their predecessors, the 334mm^2 HD5870, and are made on the same TSMC 40nm bulk process.
The raw numbers
On the physical side, we come to the size and weight of the cards themselves. They are both a little over 10.75 inches long, and the bare PCBs are a bit under 10.5 inches long. HD6900 cards are heavy little beasties, a little under 2.25 pounds for the 6870, a little over 2.25 for the 6950. I have no explanation for this 1/100th of a pound difference other than unit to unit variation, or that the 6970 was made with a little extra love.
By comparison, the Sapphire 6850 weighs about a pound even and the 6870 reference design is a lot heavier at 1.95 pounds. Basically, Sapphire saved a lot of weight, and therefore cost, on not having an overkill cooler. This isn’t to say that the Sapphire card is not good in any way, if you look at the test Max did, it ran vastly cooler than the reference design, so it looks like Sapphire just made a better, smaller, and lighter solution.
Rounding things out, the 5830 weighs in at 2.1 pounds, a likely nod to the relaxed TDP, and the 5870 Eyefinity-6 is 1.95 pounds. Going back to the 4-series, we have the Powercolor 4890 at 1.75 pounds, reference 4870 at 1.7, and the 4870×2 at 2.3 pounds. Now you know.
6970 on top, 6950 underneath, front and back
The cards themselves are clean enough, with only 8 DRAMs for the 2GB on board. This means two things, they are using high density 2Gb GDDR5 chips, and there are a lot more memory options available. 2Gb chips aren’t cheap relative to 1Gb parts, but if you want a lot of memory on a card, that is the only way.
There are 8 more memory blanks on the back of the cards, so a 2GB card from 16 1Gb chips is an option, as are 8 1Gb chips for 1GB total. If you want to make things even more interesting, a 4GB 6970 is within reach should anyone need that much memory, as is an 8GB 6990. Anyone want to place a bet that the people buying that card will have less RAM in their PC?
In either case, the current 2GB cards retail for $369 and $299, a bit high on the latter part. As soon as initial demand eases off, you should see 1GB versions as well as 4GB ones. High speed 1Gb GDDR5 parts cost about $3-3.50 each, so that should cut a good chunk off of the price. The idea of a 1GB 6950 at $275 or so is quite tantalizing, especially for Crossfire.
The switch, long may it confuse
One of the most unexpected features of the 6900 series is the BIOS switch. There has been a lot of speculation about what these two switches do. The first one is the more important of the two. It changes between two BIOSes, that’s it. While it may not sound earth-shattering, if you are an overclocker, or just like fiddling with settings, you can flash test BIOSes without needlessly worrying about bricking your card. If you screw it up, you flip the switch, reboot, and flash the bad image with a corrected one. This will save a lot of RMA’s, not to mention hairlines.
The second switch is actually a Home Depot brand switch cleverly attached to the rear of the card with duct tape. Our mock up was so good that we managed to fool quite a few people in green t-shirts though. It does look like it came from the factory, doesn’t it? Weep for the state of humanity.
Another new feature is EQAA, or Enhanced Quality Anti-Aliasing. It is a new way of sub-sampling an image to smooth out an texture scaling problems. There are two kinds of samples that you can use with AA, color and coverage. Color does just what it sounds like, and coverage is basically whether or not that part of a pixel is covered by the texture in question.
EQAA sample methodologies
What EQAA does is allow you to vary the color and coverage sample count independently. You don’t need as many color samples as you do coverage samples, so the color samples are set to one step below the coverage samples, but theoretically you could put in different splits.
The end result is a better picture for the same memory footprint as more traditional AA modes, or a smaller memory footprint for the same quality. For high resolution screens using numerically higher AA sampling, this can be the difference between swapping textures to system memory or not, a huge performance boost.
The heart of this review will center around the newest feature of the 6900 series, PowerTune. Rather than go over the same performance tests that everyone else is doing, we decided to take a hard left turn and try something different.
PowerTune lets you set a hard cap on the power consumed by the GPU, regardless of clock speed settings, temperature, and other factors. It works by comparing known performance counters to known power consumptions, basically a big lookup table with some attached smarts. This allows PowerTune to sample a lot of locations and ramp the power up and down very quickly.
Older methods of controlling power tend to be analog and feedback is slow. A thermometer changes many orders of magnitude slower than a GPU clock, so PowerTune can react much faster than anything else out there. This speed allows it to do things that were previously unthinkable, like real time monitoring of power and gracefully capping it when things go over preset limits.
To the user, it simply means that instead of a card getting really hot, then dropping suddenly in a big way, clocks just nudge up to a boundary and are pushed gently back down. The user should never notice, and gameplay should be much smoother as a result. That is the theory anyway.
The Test setup:
To test this, we took the same test rig as before, an Intel i7-970 at 3.2GHz, a Coolermaster Hyper 212 Plus heatsink, a Gigabyte X58A-UD3R motherboard, and 6GB of OCZ Platinum DDR3/1600. The only change was the long suffering WD Caviar RE2 decided to die between this test and the last, so it was replaced with an Intel X25-E 32GB SSD backed by a Seagate Barracuda 7200.11 320GB HD. All apps were installed on the Seagate drive, the OS and swap was on the Intel drive. All of this was plugged in to an Enermax Galaxy Evo 1250W PSU.
Catalyst 10.12RC2 were the drivers used. Their successors, the 10.12.RC3s are currently available as the Catalyst 10.12a Hotfix package on AMD’s web site. The test settings were all set to default, with only the PowerTune levels changed during testing.
Power use was measured with an Extech 380801 Power Analyzer and logged on an external PC. Power samples were taken every half second, the max allowable by the 380801. This power reading includes everything on the system minus the display but including mouse, keyboard, and spare Home Depot switch. Frame rates were measured with Fraps for everything but Metro 2033 where we used it’s built in benchmark tool. Two of the tests, 3DMark11 and AvP used had some conflict with Fraps, and the frame rates were obviously incorrect, so those are left out of the results.
All of the tests but Furmark took frame rates with PowerTune set at -20%, -10%, 0%, +10%, and +20%. Furmark was tested with 5% steps. A stock 6970 reference card was used for all tests.
For the results, we will first start with Metro 2033. The built in benchmark was used, and the settings were 2560*1600, DX11, Very High Quality, 4x MSAA, 16x AF, Tessellation on, PhysX off, and DOF off. Lets look at the raw numbers first, followed by a graph of the same.
Metro 2033 FPS graphs
FPS at -20% PowerTune
FPS at 0% PowerTune
FPS at +20% PowerTune
As you can see, the average frame rates go up noticeably between PowerTune -20% and -10%, and then a little more between -10% and zero. The frame average FPS are within about .5FPS of each other at zero, +10% and +20%. This says that PowerTune is stepping in a lot at -20%, a noticeable amount at -10%, and rounding a few peaks off at zero. At +10% and +20%, if the PowerTune steps in, it is barely noticeable.
What happens to the frame rate during the run? For brevity, we will only show the -20%, 0% and +20% graphs, +10% is essentially identical to +20%, and -10% is between -20% and zero. All graphs are from the middle of the three runs.
Metro 2033 power use
The averages for the graphs, rounded to the nearest Watt, are 238W, 294W, 311W, 308W, and 308W for -20%, -10%, 0%, +10%, and +20% powertune settings respectively. Since the test system idled at 135W, that leaves 105W, 159W, 176W, 173W and 173W respectively for each setting for both CPU and GPU draw.
Most noticeable from all this is that the 0%, +10% and +20% settings are basically the same. As was indicated by the frame rates, at 0%, PowerTune is barely kicking in, just a little here and there. At +10% and +20%, if it is firing off, I can’t tell where.
The -10% case is a little odd, about half way through the graph, things get very weird. Between samples 1 and 59, .5 second sample rate so 29.5 seconds, power use averages 299W. The rest of the graph averages 236W, a 33W difference. -20% is also not what we expected, it’s power use is far far lower than it should be, and this is entirely reflected in the frame rates. Unfortunately, we have no good explanation as to why the -10% and -20% readings are doing what they are.
One thing we can say conclusively is that the card gets nowhere near it’s TDP on Metro 2033. The +0%, +10% and +20% are basically the same, within experimental error differences of each other. That says the 190W TDP of the cards are nowhere near to being reached with this one.
Aliens vs Predator:
Aliens vs Predator is another interesting case. It was tested using the downloadable demo, mainly because that is simple and repeatable. This program disagreed with Fraps, so there are no FPS readings here. That said, the power graphs showed none of the oddness that Metro 2033 did.
AvP power use
Taking an average of the power from samples 30-230, basically the entire demo with the intro and ending cut scenes removed, gives us power use of 253W, 280W, 312W, 316W and 318W. Going from -20% to -10% uses 27W more, -10% to 0% adds 22W, 0 to +10% adds 4W, and +10% to +20% adds only 2W. The power use goes up and hugs the cap nicely as expected, and a bit of that load is due to more CPU load from increased frame rates.
From this, we get that a 10% step in PowerTune adds about 25W, or it can add 25W. Subtracting out the test PC’s idle of 135W, we see a max of 183W used for both CPU and GPU, so it isn’t anything close to the 190W TDP once again. The most intrusive PowerTune gets is rounding the occasional peak throughout the approximately 2 minute demo, basically where the yellow line sits below the other two in the middle of the run. In any case, there was no noticeable change in how the demo ran.
Once again, PowerTune does it’s job unnoticed, correctly, and does not impact the game play. If you wanted to be a bit cynical, you could say that the TDP for the card appears to be set a tad lower than it really is, but that is by less than 6W max on average.
3DMark11 is an interesting case for two reasons. First, only -20% seems to activate the PowerTune limits. The other four levels are right on top of each other. This means that 3DMark11 doesn’t come close to pushing the TDP of the 6970 cards. The difference between -20% and the rest is between 15 and 30W, about what you would expect from the settings. Keep in mind that CPU load also changes dramatically with the tests too, especially the physics based ones.
3DMark power use
The other thing to note is how much longer the last test takes on the -20% setting. If you remember back to the architecture article, there was a slide about different power strategies. AMD’s claim was that with power containment, aka PowerTune, you could actually get certain tasks done faster. The last peak of the 3DMark11 power graph sure seems to indicate that the claims are at least somewhat based on the real world.
Once again, PowerTune did it’s job, and was within the expected limits. This is starting to sound a bit tired, but the data backs it up.
Unigine’s Heaven 2.1 demo was run at 2560*1600 with everything set to high, trilinear filtering, 8xAA and 16xAF. Tessellation was normal, and thankfully, Fraps didn’t take a strong dislike to this program like the last two. Fraps and the internal frame counter agreed with each other to within tenths of a FPS, so it looks like the readings are dead on.
Unigine Heaven FPS
The scores were 388, 621, 705, 704, and 705 for PowerTune at -20%, -10%, 0%, +10%, and +20%. FPS as reported by the engine were 15.4, 24.6, 28, 28 and 28 respectively. As far as scores and FPS are concerned, PowerTune simply does not kick in at 0% and above settings.
Unigine Heaven power use
Moving right along to the wattage graph, you can see that Heaven produces some of the same off drop-offs as Metro 233 at the -10% settings, and the -20% level produces the same unusually low readouts. The wattage used, from low settings to high, is 220W, 279W, 302W, 301W, and 304W. The rest is what you expect, PowerTune doesn’t kick in at 0% at all, TDP is set right, and the world goes on as normal.
Last up, we did something different, taking Furmark and stepping the PowerTune settings up by 5% every 30 seconds. Power was measured as normal, and the GPU clocks were measured by GPU-Z. Furmark was started out with PowerTune at -20%, and stepped up to +20%, then briefly back down to idle. The power graph was cut a little closer than the GPU speed graph for clarity, mainly so you could see the idle clocks.
Furmark power use
Furmark GPU speeds
Isn’t this interesting, you can see the transitions in PowerTune states with a brief spike to the 880MHz clock speed, so the changes are easy to spot. At -20% and -15%, the GPU sits at 500MHz, but jumps around a lot. Upping the power to -10%, the clock drops to about 460MHz, but jumps around quite a bit.
Going to -5%, the clock moves up to around 540MHz, but starts jumping around like an ADHD addled journalist in an enterprise technology briefing. At 0%, it settles down a lot, and speeds go up very close to the 600MHz that others have reported. +5% brings another speed bump to a bit over 700MHz, and stabilizes the jumping a lot.
+10% adds about 100MHz to the total, but the clocks seem to fluctuate between there and 880MHz. It is on the verge of power throttling there. At +15% and +20%, the GPU remains locked on to the 880MHz limit, meaning that the GPU isn’t being scaled at that point.
More interestingly, power use does not increase much at the +10% to +15% transition, and stays a bit high even when things are ramped back down to 0% at the end. You can see the 0% levels where the clocks are roughly the same, in the middle of the graph steps and again after the plateau at 880MHz. It is sitting at around 600MHz at both points.
The only potential bone of contention is that the system power goes from about 135W idle to 400W total, about 75W more than any of the other tests push it. This means that Furmark does push things beyond the official TDP of the card like it used to, and PowerTune does not seem to be able to totally catch up with it. It still works, that it totally clear, but it does not quite lock things down as hard as on the other programs.
In the end, a few things become clear. PowerTune does work, and as long as you are hitting the preset power limit, it will indeed cap you in a non-intrusive way. Once you set it to a level that is above the power used, it simply does not step in, much less cap anything.
If you only add in a little margin for increased CPU power use between idle and load, the TDP changes are well within the expected range of a 190W card. The power is capped at a set level, and changes do affect both clock speed and Wattage used as claimed. Basically it does what it says, but isn’t an absolute as Furmark shows. Then again, if it flat out doesn’t work, the traditional temperature and amperage based controls should still step in to prevent the magic smoke from getting out. I am told the HD6900 series has blue-grey smoke powering the chips.
The card itself is clearly worlds apart from it’s Evergreen/5xxx and even 68xx predecessors. It is a whole new shader architecture, and just at the beginning of the optimization curve. Cayman/6900 delivers on the promises AMD made in private over the summer, and sports a few new features on top of that. It is going to be VERY interesting to see what happens with better drivers in a few months.S|A
Latest posts by Charlie Demerjian (see all)
- Globalfoundries 7nm process isn’t even close to the name - Sep 26, 2016
- ARM upgrades realtime offerings to v8-R and adds Cortex-R52 - Sep 21, 2016
- Everspin and Globalfoundries team up for embedded ST-MRAM - Sep 15, 2016
- Intel’s Xpoint is pretty much broken - Sep 12, 2016
- ARM adds 2048-bit vectors to v8A with SVE - Sep 7, 2016