HSA updated to v1.1 with new features

Computex 2016: Backwards compatible, forward looking

May 31, 2016 by Charlie Demerjian

The HSA Foundation is announcing v1.1 of their spec today with some important changes. Since SemiAccurate first brought you the news of HSA years ago, the slow progress forward has been picking up pace quickly.

Just over a year from the March 2015 launch of v1.0 of the HSA spec comes the new revision. Since fully HSA1.0 compatible devices are just hitting the market now, there is a Dell box imminent with all the features enabled on a Carrrizo desktop, how long will we have to wait for v1.1 hardware? That is the beauty of the v1.1 spec, there are no hardware changes required so if you are HSA1.0 compatible you are HSA1.1 compatible.

In light of this it may not seem like v1.1 brings much to the table but that is not the case. Understanding those differences may take a pretty technical mind but most SemiAccurate readers will probably keep up. If you aren’t familiar with the current state of HSA, we won’t rehash it in detail but start here and here and here.

The basics don’t change with v1.1 which is no surprise in light of the hardware compatibility, you still compile your code to the HSAIL intermediate language and that still uses the hQ structure to pass data. Memory is still pinned rather than copied ad infinitum, and efficiency of data use is still a top priority. What the new spec offers are features that most wanted earlier and wider support.

First up is the spreading of HSA to a wider range of hardware. Currently it only supports a CPU and GPU as targets because, well, that’s all the hardware that was out there. Looking at the breadth of HSA Foundation members it is clear that support will add a wider class of devices and v1.1 delivers just that. Image processors, DSPs, NICs, and a whole host of other ISA are now officially supported, and nearly anything with a bunch of parallel processors can be added. Look for more to be added as soon as hardware from supporting vendors is official announced.

Software and drivers are a key point in this type of multi-vendor interaction and HSA1.1 supports that in a few new ways. There are transparent features now and mulit-vendor IP blocks are better integrated too. Also added are several key pieces of information that can be explicitly queried too so if you want to know where a page table resides you can actually get that answer directly. This was a big request in older versions, sometimes you need to find information that a generic abstraction can’t provide.

Speaking of generic abstractions in v1.1 we have a vendor neutral generic driver, something that makes sense for a virtual ISA like the one HSA provides. Think of this driver as the layer from the system to HSA, each hardware vendor still needs to provide a specific driver for their hardware. What this will hopefully do is simplify and standardize the coding process for the user and software writer. If you think about it, abstracting the hardware from the user perspective is a key enabler for multiple classes of hardware, and more direct polling of structures does most of the rest.

Another big gap in the 1.0 spec surrounds pausing of threads. As you might be aware the hQ structure and massively threaded programming does a lot of pausing of threads and HSA is no exception. HSA1.0 has mechanisms for pausing and restarting threads but 1.1 takes it a step further with multi-wait signals. A thread can now wait for more than one signal and apply some simple logic to the unpause process. This may seem basic but it wasn’t there and now is.

Multi-wait signaling can be combined with so called forward progress rules, also new. This does exactly what it sounds like, it guarantees a thread will make progress under certain conditions. Between the two new features you can implement QoS and service guarantees, a necessary step for many media applications especially real-time ones. With HSA1.1 you don’t have to write it all yourself, the primitives are there for you and work across multiple hardware blocks.

Memory has a lot of new features too starting with a formal memory model definition. If you think about it, HSA already has both big and little endian hardware supported and with new additions a formal model will make life a lot saner for programmers. Non-temporal memory accesses, basically cache eviction after use, and multi-agent sharing of images is also now official, the latter being effectively access to pinned memory spaces from multiple blocks.

That brings us to the tools and here we start with the finalizer. It now allows linkage to standard code objects and supports versioning. The first of these is pretty self explanatory, the second tends to be more familiar to the non-consumer space. Versioning effectively allows the code to specify which variant of the finalizer it needs and more importantly get it. This may seem trivial to non-programmers but you just need to look at modern shenanigans by Microsoft and forced march upgrades to realize how valuable flexible versioning is.

Moving farther down the tool chain there is a new profiling API that supports multiple agents. Massively threaded code running across multiple differing hardware blocks can be a tad problematic to optimize and that is where the new API comes in. Depending on the hardware it can work in realtime or just gather data for future examination. There is now direct access to hardware counters and timestamps as well. Gaming is an obvious example of where this will pay off but media work will see huge benefits as well. If nothing else this will save a lot of programmer hair from hitting the floor after being torn out in chunks.

Last up we have added language support with the headliner being Python and the Numba NumPy aware compiler. It is available on GitHub with direct HSA support and does automatic parallization of supported functions. Java, Javascript, OpenCL, and C++ were supported too but the new C++17 standard will probably include a standard template library for HSA as well. All the HSA1.0 tools and toolchains will work with v1.1 as well and most new features will likely be supported in the near future.

Those are the major points of the new HSA1.1 spec. It runs on the same hardware, adds a few languages and profiling tools, and brings a lot of new hardware possibilities into the mix. To support this there is a standard virtual ISA, explicit querying of some structures, and a much more robust signaling and wait model. To the user it should work better on more devices, to the programmer it should be simpler to support and easier to fix when things go wrong. For the hardware vendors it can only increase the available software for their offerings, what’s not to like?S|A

Bio
Latest Posts

Charlie Demerjian

Roving engine of chaos and snide remarks at SemiAccurate

Charlie Demerjian is the founder of Stone Arch Networking Services and SemiAccurate.com. SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, securing and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture. As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends. Charlie is also available through Guidepoint and Mosaic. FullyAccurate

Latest posts by Charlie Demerjian (see all)

What is Qualcomm’s Purwa/X Pro SoC? - Apr 19, 2024
Intel Announces their NXE: 5000 High NA EUV Tool - Apr 18, 2024
AMD outs MI300 plans… sort of - Apr 11, 2024
Qualcomm is planning a lot of Nuvia/X-Elite announcements - Mar 25, 2024
Why is there an Altera FPGA on QTS Birch Stream boards? - Mar 12, 2024

Thank you, Subscribers!

Thank you to our Subscribers, past and present.

You are appreciated.

You are what keeps SemiAccurate going, what allows us to maintain our journalism, what keeps us ad-free, what allows us to tell it like it is, it is still just you. You, the reader and subscriber, we thank you.

If you want to know more about subscriptions, both free and paid, the information can be found here.

For more on our track record of leading edge journalism see Fully Accurate.
Our Writers

Charlie Demerjian is the founder of Stone Arch Networking Services and S|A.

SemiAccurate.com is a technology news site; addressing hardware design, software selection, customization, security and maintenance, with over one million views per month. He is a technologist and analyst specializing in semiconductors, system and network architecture.

As head writer of SemiAccurate.com, he regularly advises writers, analysts, and industry executives on technical matters and long lead industry trends.

Thomas Ryan is a GIS Programmer and freelance technology writer from Seattle, WA. You can find his work on SemiAccurate and PCWorld.
Tweets from https://twitter.com/SemiAccurate/lists/writers

SemiAccurate

On Target Technology News

Hot Article AMD to differentiate cores

Hot Article Intel foundry customer bails out

Hot Article Coffee Lake is going to impact Intel’s margins

Hot Article SemiAccurate digs up Intel Coffee Lake specs

HSA updated to v1.1 with new features

Computex 2016: Backwards compatible, forward looking

Charlie Demerjian

Latest posts by Charlie Demerjian (see all)