Ada 2022 'parallel' implementation beta for FSF GCC/GNAT

Richard-Wai · December 16, 2025, 3:52am

I am very pleased to announce that the core Ada 2022 “parallel” features have been implemented for mainline FSF GNAT as part of a successful Google Summer of Code project. The patch is now ready for beta testing.

We are preparing to formally submit this patch to the FSF GCC project, to have it incorporated into GCC trunk, and therefore all future FSF GCC releases. Before we make that submission, we hope to seek additional feedback from the Ada community.

This patch introduces most core capabilities for the parallel keyword, including:

Parallel loops
Parallel blocks
Early exit
Chunking

This patch does NOT yet support Parallel Iterators - but this is being actively worked on and hopefully will be completed soon.

The GSoC project work was hosted on the Ada Rapporteur Group’s own GCC mirror github repo, and the stable version of the parallel beta release currently lives at https://github.com/Ada-Rapporteur-Group/gcc-mirror/tree/devel/arg-proto/ada2022-parallel-release.

This branch can be built and bootstrapped as-is on most mainstream platforms using FSF GCC 15, with a standard build process. No additional libraries or build flags are needed, as the parallel features do not imply any new complier, runtime, or platform dependencies.

Additionally, Maxim Reznik has made available binary builds of the parallel support beta for most popular platforms. He also includes instructions on how to get going with Alire. Look for the “GCC with parallel PREVIEW” release at https://github.com/reznikmm/GNAT-FSF-builds/releases. Expand the “Assets” area at the bottom of the section to download binary builds for your platform.

By default, GNAT will expand parallel loops/blocks into sequential (regular) loops resp. blocks. To get actual parallelization of parallel constructs, the existence of an Ada “light weight threading” library (formally an Ada subsystem a la the Ada Reference Manual 10.1-3) is required. GNAT will detect the presence of the “LWT” subsystem at compile-time, and if “withed”, will generate calls to the LWT subsystem during expansion. It is therefore conventional that the unit containing the main subprogram “withs” LWT. Refer to the example programs included with the reference LWT subsystem to see how this works.

A reference LWT subsystem implementation currently lives under the Parasail language project (https://github.com/parasail-lang/parasail). This may be separated from the Parasail repository at a later time. This LWT implementation is also an official Alire crate of the same name, and Maxim’s instructions detail how to install LWT and use the beta toolchain under Alire.

The reference LWT subsystem could potentially be incorporated into the GNAT standard library (libgnat) at some point in the future, but it was decided to keep the first iteration as simple and digestible as possible.

This is only the first phase, and we look forward to additional refinements in the future!

***

Note that Alire is NOT required to beta test this build, and simply making the sources of the LWT reference implementation available to the compiler is sufficient (using -I for gcc or gnatmake)

There are also some Ada 2022 parallel example programs under lwt/a22_examples, and these can be build and run with the vanilla FSF GNAT toolchain as follows (Linux/UNIX):

$ git clone https://github.com/parasail-lang/parasail

$ cd parasail/lwt/a22_examples

$ gnatmake -gnat2022 -I../ n_queens.adb

$ ./n_queens

***

We would love to have more members of the Ada community try-out these features. Please let me know if you need any support getting set up, or if you have any other questions.

Lucretia · December 17, 2025, 1:46pm

Can parallel be used in Spark?

JeremyGrosser · December 17, 2025, 6:22pm

I’m having fun playing with this, thanks for everyone’s hard work!

I’ve built a somewhat contrived example of a game physics system that performs something equivalent to Position := Position + Velocity * Delta_Time; for 1 million entities. I’m iterating over large arrays of Float. It autovectorizes nicely on a single core before adding parallel.

I’ve observed that everything is much faster if workers are pinned to specific CPU cores. For example, my laptop has an Intel 258V with 4x “Performance” P-cores and 4x “Efficiency” E-cores. My program runs much faster if pinned to the P-cores. If a worker gets scheduled on the E-cores, it causes the whole process to slow down. Even when restricting it to the P-cores with taskset, migrating between them degrades cache efficiency. Maybe this is out of scope for the runtime, but it seems like heterogeneous CPU configurations are commonplace and pinning each worker to a single CPU should be straightforward, if not an automatic default. I think this is something that would be implemented in LWT?

The OpenMP scheduler works better for me than Work_Stealing. Work_Stealing causes 6.5x more branch misses for the same workload.

Is it possible to get OpenMP to run on my GPU? I tried rebuilding the compiler with --enable-offload-targets=amdgcn-amdhsa and adding -foffload=amdgcn-amdhsa to my compile flags, but it still runs workers on the CPU. -fopenmp throws a warning saying Ada is not supported.

Irvise · December 17, 2025, 7:09pm

EDIT: clarity

You are not the first person trying this out! Actually, afaik, LWT is implemented on top of OpenMP! There are also some papers on the topic beyond LWT: Enabling Ada and OpenMP runtimes interoperability through template-based execution - ScienceDirect

I would LOVE for Ada to become OMP compatible, I really do. For the exact same reason you point out, imagine running Ada on your GPU! (Well… Actually… Some people have done that years ago GitHub - AdaCore/cuda) But afaik, the way that Ada has been tied to OpenMP is by importing OMP function calls using the C calling convention and then running those binded functions. This method does not scale and does not allow for OMP to take place in all of its glory… Not by far… The source code is not annotated, it is not lowered to IR and it is not compiled. Currently, Ada is just binded with the OMP lib and that is it… That is quite sad…

But there is GNAT-LLVM and I have been playing with it. It would be lovely if we added an aspect (can be compiler dependent) that would be something like with OpenMP => (block..., private x, y....) and that annotation would be lowered to IR just like comments happen in C/C++/Fortran… The same could be done with OpenACC too if preferred!

Best regards,
Fer

ValorZard · December 20, 2025, 1:06am

@Richard-Wai does this use io_uring at all?

Irvise · December 30, 2025, 7:04pm

How is the submission going?

AFAIK, this is about raw parallelism and, afaik, the current implementation (lwt) is using OpenMP. However, the Ada standard does not indicate what the underlying implementation has to be, so all options are open!

Best regards,
Fer

Blady · January 11, 2026, 10:12am

Thanks all for the work done.
Special thanks to Maxim for the binary releases.
For the record, there was no major problem on Intel macOS 13.6.8 (x86_64-apple-darwin22.6.0).
I had to fix some minor points in gnat-x86_64-darwin-16.0.0-20251102:

% chmod -R +w .
% xattr -r -d com.apple.quarantine .
% export LIBRARY_PATH=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib

and in lwt/lwt/a22_examples:

% cat prj.gpr 
project Prj is
   for Source_Dirs use (".", "..");
   for Object_Dir use "obj";
   package Compiler is
      for Default_Switches ("Ada") use ("-gnat2022");
   end Compiler;
end Prj;

Irvise · January 13, 2026, 7:21pm

Hi @Richard-Wai,

are there any news of upstreaming this to GCC/GNU? I checked the mailing list threads and changelogs and I could see no mention of it. I am bringing this up as GCC just reached “stage 4” of the next release https://www.phoronix.com/news/GCC-16-Stage-4-Development

Best regards,
Fer

Richard-Wai · January 14, 2026, 4:31am

@Irvise

Indeed that is 100% the plan but we are not there yet. And in the good Ada fashion, we’re not really going to try to rush anything, we’d rather get it right! This is a pretty large addition and will likely take more time to review by GCC maintainers as well.

I also note that Ethan is hard at work expanding the support as well, including adding things like array iteration as well.

Richard-Wai · January 14, 2026, 4:34am

@ValorZard

io_uring is really a mechanism for improving I/O workloads and is not really applicable to raw fine-grained parallelism like were are trying to implement in Ada.

Furthermore Ada is highly portable, and we generally try to avoid tying any feature to a specific platform or OS. That being said, and as is often the case with Ada’s design, parallel was designed to be easily implementable with common libraries like OpenMP.

sttaft · January 14, 2026, 7:59pm

LWT supports one or more scheduler “plug-ins” which worry about mapping of light-weight threads to “server”/”kernel” threads. Presumably the O/S worries about mapping kernel threads to processors. Right now there are two plugins, one is OpenMP., The other is based on work-stealing, which in turn just uses a set of Ada tasks as the equivalent of server/kernel threads. The operations in System.Multiprocessor.Dispatching_Domains could presumably be used to pin the server threads, but that is not done currently, but could be. The LWT work-stealing plugin is open source, so feel free to fiddle with it a bit to see if you could automatically “pin” the server tasks/threads.