Intel Releases Two New ISAs, APX and AVX10

Lots of problems solved, new ones created

Intel LogoIntel just announced two new ISAs for Granite Rapids, AVX10 and APX. SemiAccurate is just starting to dig into the details but we thought you would like to read along with us.

The two new ISAs are called AVX10, the successor to AVX-512, and APX or Advanced Performance Extensions. Both will make your life better, make the birds sing louder, and food will taste better, either that or they will just be invisible under the hood unless you are into compiler minutia. In any case we thought you would like to know what we found out so far. There are five PDFs that you should look over in your free time, all can be found here. The section in question is about half way down the page and titled, “Intel® Architecture Instruction Set Extensions Programming Reference and Related Specifications”.

First up is APX and it brings five new high level features starting with 16 new registers, new 3-operand instruction formats, new conditional features, some new save/restore bits, and a new 64b jump. Rather than retype all the details, lets just cheat and screen cap the PDF.

Intel APX ISA overview

APX in a nutshell

The short version of APX can be found in the PDF entitled, “Intel® Advanced Performance Extensions (Intel® APX) Software Enabling Introduction“, and it is deep enough for the average SemiAccurate reader to understand but doesn’t require intricate knowledge of x86 instruction minutia. If you are into x86 instruction minutia however, Intel has the PDF for you, this one is entitled, “Intel® Advanced Performance Extensions (Intel® APX) Architecture Specification“, 312 pages of unbridled fun for the assembly coder in your life. Pay particular attention to table 7.49.3 on page 192, one glance and you will understand why. Yes that is a joke, there is nothing special about the table or the page but some of you didn’t read this far before you clicked. Trust me, it is really hard to keep your sanity when writing up x86 ISA details.

To keep things short, the main thing that the APX ABI adds are new registers which will undoubtedly make everything better for everyone. On top of that they add a few new instructions to once again ease a few pain points and simplify software a bit. Most users will never know it exists or care, that is the work of compilers and those who write them. Once software catches up, APX should make things a bit more efficient by removing some legacy bottlenecks.

The AVX10 ISA is a lot more interesting to SemiAccurate and that is not just because someone finally wised up and removed the dash in the name. AVX10 is meant to be a superset of the various AVX types out there, and they were a mess. That said some of the more esoteric AVX-512 instruction sets from the Xeon Phi/Knights Something line that are not in AVX-512 now will be forever left aside to the lamentations of no one really. Basically if it is going to be supported from here on out, it will be in AVX10, if it isn’t, don’t expect it to make a comeback so there is closure.

Intel AVX10.1 and AVX10.2 feature set

What AVX10.1 and AVX10.2 bring to the table

There are three vector widths in AVX10, 128b, 256b, and 512b. The initial release of AVX10 on Granite Rapids, charmingly called AVX10.1 (pre-enabling) only guarantees support for 128b and 256b, the full 512b will be optional. Since AVX10.1 is only for Granite at the moment, and Granite is P-core only, this isn’t a big deal. 10.2 will be supported on both big and little cores, plus it will add support for 256b embedded rounding and a whole bunch of new instructions. Basically Diamond Rapids is when AVX10 gets real.

Intel AVX10 feature flags with AVX-512 subsets

What is omitted is the important bit

The big problem is that while AVX10.x supports AVX-512 instructions, it only guarantees to support AVX-512 instructions on 256b vector lengths. This is Intel basically throwing in the towel on mainstream 512b vector support. Given how painful the idiocy surrounding AVX-512 has been recently, fusing off AVX-512 in the big cores because the little cores didn’t support it, this cleanup is long overdue. Unfortunately AVX10 in 256b guise which is supposedly guaranteed in all future Intel releases won’t run 512b vector length code. It supports AVX-512 _INSTRUCTIONS_ but only if they use 128b or 256b vectors. AVX10 on CPUs with the optional 512b vector length supported in hardware should run all AVX-512 code for obvious reasons.

This is going to be painful in the short term because there is no real way of a user knowing if their AVX-512 code has 256b or 512b vectors. Yes you can figure it out if you are technically inclined but, well, you shouldn’t need to be. x86 is compatible with code going back to 1970s dishwashers for a reason, and this is a major discontinuity. That said it will clean things up in the long term but it marks the death knell for 512b vectors in the consumer space and puts it on life support for servers. Welcome to AMD’s point of view Intel, took you long enough.

If you read the Intel white paper titled, “The Converged Vector ISA: Intel® Advanced Vector Extensions 10 Technical Paper“, you understand the lengths they went to downplay the deprecation of 512b vectors. For some reason the hollow marketing screed of the last decade was just proven hollow and they don’t want to admit it. Take this gem for example, “The converged version of the Intel AVX10 vector ISA will include Intel AVX-512 vector instructions with an AVX512VL feature flag, a maximum vector register length of 256 bits, as well as eight 32-bit mask registers and new versions of 256-bit instructions supporting embedded rounding. This converged version will be supported on both P-cores and E-cores. While the converged version is limited to a maximum 256-bit vector length, Intel AVX10 itself is not limited to 256 bits, and optional 512-bit vector use is possible on supporting P-cores. Thus, Intel AVX10 carries forward all the benefits of Intel AVX-512 from the Intel ® Xeon ® with P-core product lines, supporting the key instructions, vector and mask register lengths, and capabilities that have comprised the ISA to date. Future P-core based Xeon processors will continue to support all Intel AVX-512 instructions ensuring that legacy applications continue to run without impact.” Clear? I’ll wait while you read it again.

This is going to be a nightmare for any organization with 512b code running on non-Xeon CPUs in the future. Actually since Intel does not guarantee that ANY P-core will have 512b vector support at all, Xeon or not, your mileage may vary. Knowing Intel, they will fuse off 512b support on cores that physically have it and extort the users that need it to not blow that fuse. Write 512b code for x86 at your own peril people, one set of AVX-512 messes cleaned up, another set of AVX10 messes created.

The last bit to make note of is the enumeration. Intel goes to great lengths to point out that AVX10 is a superset of all AVX-512 instructions so you don’t need to poll endless capability flags to figure out what a given processor supports, you just need to check the AVX10 version number and three bits that define the vector lengths at 128/256/512b. This is a good thing. The problem is, as we pointed out above, that lots of AVX-512 extensions are not officially supported in the latest spec so… umm… version numbers, bits, and flags it is for the foreseeable future I guess. And then there is the vector length issue in the code which technically isn’t Intel’s problem. It doesn’t look like AVX10 can do 512b vectors in two passes on 256b wide hardware so, well, we can’t explain that one either.

In the end, AVX10 will clean up a lot of pain points, eventually. Intel has FINALLY come out and said that the extensions that were not supported in mainstream AVX-512 are basically dead from here on out. When that code base dies off, this change will clean up a lot but the 512b vector bearing code will be a pain point for a long time to come. With the waffling on optional 512b support, Intel is creating a massive new headache for no real reason. Rather than just knife 512b vectors going forward and making tools to help port existing software, they added complexity. One step forward….S|A

The following two tabs change content below.

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate
Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate