Altera puts hard FP functions in Arria 10 and Stratix 10

Outsized performance boost if you use FP in your designs

Altera logoAltera today announced a very interesting new feature on its Arria 10 and Stratix 10 FPGAs, a hard floating point unit. In the PC world this wouldn’t garner a second look but for FPGAs it is a first with big implications and huge benefits if you use FP in your FPGA designs.

The idea sounds simple enough, put hard FP functions in the DSP blocks of an FPGA, but there are trade-offs. The biggest one is that those transistors take up area, about 1% of the die according to Altera, and that is reflected in ASIC costs. Another way to look at the cost is that FPGAs are all about packing in as many gates as possible, that 1% area could add 1% more usable logic to customers who generally always want more.

FPGAs are generally thought of as fixed point only devices because that is what they offer the customer, if you want FP you generally have to implement it yourself. There are some people out there that want FP in FPGAs enough to write their own routines or do the conversion in programmable logic. This eats gates, lots and lots of gates, takes a lot of time, and generally is hard to accomplish but there is demand.

With the release of the Cyclone, Arria, and Stratix V families, Altera released their own soft FPGA implementation that they estimate shaves several months off a typical implementation cycle. This is mainly due to verification times or lack thereof, making IEEE 754 compliant logic is hard, verifying it all is harder still. With a standard verified reference design provided by Altera, things become much easier and faster for designers.

While soft implementations work well, they are costly in terms of logic gates. Each DSP slice requires over 1000 LEs to implement FP functionality and even more interconnects. This severely limits timing, Altera claims 400-450MHz for hard FP devices vs 200-250MHz for the same device running software. On top of this you save the power draw from 1000+ LEs that are no longer being flipped even if they are not used elsewhere. Altera also says that the interconnects used for FP implementations are often the limiting factor for soft FP designs so hard FP is a win on this front too.

One roadblock to implementing hard FP blocks in the Arria and Stratix 10 lines are that most customers use FPGAs as fixed point units. Putting FP blocks into a design was a risky move when these parts were started several years ago, would it pay off? Altera thinks the trade-off is more than worth it because hard FP units are now found on all Arria 10, shipping now, and Stratix 10 parts, top to bottom.

Demand is said to be driven by the new OpenCL tools Altera released a bit ago along with DSPBuilder and FP Megafunction support. They made it easier to use FP in FPGAs so for some odd reason customers are doing it. In an interesting twist on the normal promise and release cycle, the hardware with hard FP blocks has been on the market since last June but the enabling code won’t be ready for public release until 2H/2014. I would suspect that if you ask Altera sweetly, bat your eyelashes at your rep, and sign a few phonebooks worth of NDAs you can get a reasonably close to final code drop now.

So what do you get with the hard FP blocks? That is easy, in addition to the 18b and 27b fixed point math pipes in the DSP, there is now an IEEE 754 pipe. The underlying hardware implements hard add and multiply pipes that are simultaneously usable. Once again stepping back to a PC CPU-centric world view, two pipes may seem underwhelming in the extreme but remember they are in each DSP and are potentially replicated thousands of times per die. For the FPGA world it can almost double clocks and free up thousands of LEs and lots of interconnects per DSP.

Altera IEEE 754 pipes on Arria 10 and Stratix 10

Simple enough concept to add an IEEE 754 pipe

If you use FP in your designs that is. If you don’t, you get about 1% less functionality from your FPGA than you would otherwise get, but since no Altera device of this generation comes without FP, too bad. If you do use FP, it is a no-brainer to add hard FP functions into your design, I can’t see anyone not doing so. If there were any reluctant designers out there, Altera says their hard FP tools will enable seamless migration of designs from older Altera soft FP designs. Point, click, and voila, almost 2x the clocks and oodles of now free blocks and interconnects. The official claim is that hard FP will save between 6-12 months vs a designer doing their own float -> FP -> verify -> verify ->…. process. For those interested in hard numbers, Altera says the Arria 10 will push 1.5TF SP FP and the forthcoming Stratix10 will push 10TP SP FP. Compare that to an AMD R9 290X which will do 5.6TP SP FP for a good reference point.

That brings up the question of who would want FP in an FPGA, most FPGA customers seem to be doing just fine with fixed point now thank you very much. The answer to that was a bit vague because Altera doesn’t really comment on their customer’s designs or applications. Some new markets like search engines were tossed out but specific new markets are the wrong way to think about possible applications for this new functionality. What hard FP does is expand the reach of FPGAs into areas that they were simply not able to address before.

An example of this would be someone who was testing or using FPGAs for FP work but the performance, performance per watt, or performance per dollar just wasn’t there. With hard FP effectively doubling the clocks, freeing up resources, and lowering power, Altera at least doubled the raw performance and performance per watt, not to mention lowered the watts used overall. All this for a cost of roughly 1% die area, not a bad trade.

If you don’t use FP in your FPGA designs the new functionality really makes no difference. If you do use FP, the new IEEE 754 pipe will change the game if the results are anything close to what Altera says it will be. FP capable FPGAs will open up parts of the market that were previously not good candidates for Altera silicon and put more pressure on bespoke ASICs not to mention GPUs in compute. The only question now is how soon will the competition follow suit, or more to the point how soon can they?S|A

Note: Some of you may have noticed a few pieces of backchannel FUD on a site with tenuous ethics and intentionally undisclosed sources of funding. This is what they were trying to preempt and the methods used shine a very unflattering light on what the author previously thought was a pretty upstanding company. While sad, it is a good thing to know about, and no we won’t link them here.

Have you signed up for our newsletter yet?

Did you know that you can access all our past subscription-only articles with a simple Student Membership for 100 USD per year? If you want in-depth analysis and exclusive exclusives, we don’t make the news, we just report it so there is no guarantee when exclusives are added to the Professional level but that’s where you’ll find the deep dive analysis.

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also a council member with Gerson Lehman Group. FullyAccurate