Ada 2022 'parallel' implementation beta for FSF GCC/GNAT

I am very pleased to announce that the core Ada 2022 “parallel” features have been implemented for mainline FSF GNAT as part of a successful Google Summer of Code project. The patch is now ready for beta testing.

We are preparing to formally submit this patch to the FSF GCC project, to have it incorporated into GCC trunk, and therefore all future FSF GCC releases. Before we make that submission, we hope to seek additional feedback from the Ada community.

This patch introduces most core capabilities for the parallel keyword, including:

  • Parallel loops

  • Parallel blocks

  • Early exit

  • Chunking

This patch does NOT yet support Parallel Iterators - but this is being actively worked on and hopefully will be completed soon.

The GSoC project work was hosted on the Ada Rapporteur Group’s own GCC mirror github repo, and the stable version of the parallel beta release currently lives at https://github.com/Ada-Rapporteur-Group/gcc-mirror/tree/devel/arg-proto/ada2022-parallel-release.

This branch can be built and bootstrapped as-is on most mainstream platforms using FSF GCC 15, with a standard build process. No additional libraries or build flags are needed, as the parallel features do not imply any new complier, runtime, or platform dependencies.

Additionally, Maxim Reznik has made available binary builds of the parallel support beta for most popular platforms. He also includes instructions on how to get going with Alire. Look for the “GCC with parallel PREVIEW” release at https://github.com/reznikmm/GNAT-FSF-builds/releases. Expand the “Assets” area at the bottom of the section to download binary builds for your platform.

By default, GNAT will expand parallel loops/blocks into sequential (regular) loops resp. blocks. To get actual parallelization of parallel constructs, the existence of an Ada “light weight threading” library (formally an Ada subsystem a la the Ada Reference Manual 10.1-3) is required. GNAT will detect the presence of the “LWT” subsystem at compile-time, and if “withed”, will generate calls to the LWT subsystem during expansion. It is therefore conventional that the unit containing the main subprogram “withs” LWT. Refer to the example programs included with the reference LWT subsystem to see how this works.

A reference LWT subsystem implementation currently lives under the Parasail language project (https://github.com/parasail-lang/parasail). This may be separated from the Parasail repository at a later time. This LWT implementation is also an official Alire crate of the same name, and Maxim’s instructions detail how to install LWT and use the beta toolchain under Alire.

The reference LWT subsystem could potentially be incorporated into the GNAT standard library (libgnat) at some point in the future, but it was decided to keep the first iteration as simple and digestible as possible.

This is only the first phase, and we look forward to additional refinements in the future!

***

Note that Alire is NOT required to beta test this build, and simply making the sources of the LWT reference implementation available to the compiler is sufficient (using -I for gcc or gnatmake)

There are also some Ada 2022 parallel example programs under lwt/a22_examples, and these can be build and run with the vanilla FSF GNAT toolchain as follows (Linux/UNIX):

$ git clone https://github.com/parasail-lang/parasail

$ cd parasail/lwt/a22_examples

$ gnatmake -gnat2022 -I../ n_queens.adb

$ ./n_queens

***

We would love to have more members of the Ada community try-out these features. Please let me know if you need any support getting set up, or if you have any other questions.

19 Likes

Can parallel be used in Spark?

2 Likes

I’m having fun playing with this, thanks for everyone’s hard work!

I’ve built a somewhat contrived example of a game physics system that performs something equivalent to Position := Position + Velocity * Delta_Time; for 1 million entities. I’m iterating over large arrays of Float. It autovectorizes nicely on a single core before adding parallel.

I’ve observed that everything is much faster if workers are pinned to specific CPU cores. For example, my laptop has an Intel 258V with 4x “Performance” P-cores and 4x “Efficiency” E-cores. My program runs much faster if pinned to the P-cores. If a worker gets scheduled on the E-cores, it causes the whole process to slow down. Even when restricting it to the P-cores with taskset, migrating between them degrades cache efficiency. Maybe this is out of scope for the runtime, but it seems like heterogeneous CPU configurations are commonplace and pinning each worker to a single CPU should be straightforward, if not an automatic default. I think this is something that would be implemented in LWT?

The OpenMP scheduler works better for me than Work_Stealing. Work_Stealing causes 6.5x more branch misses for the same workload.

Is it possible to get OpenMP to run on my GPU? I tried rebuilding the compiler with --enable-offload-targets=amdgcn-amdhsa and adding -foffload=amdgcn-amdhsa to my compile flags, but it still runs workers on the CPU. -fopenmp throws a warning saying Ada is not supported.

2 Likes

EDIT: clarity

You are not the first person trying this out! Actually, afaik, LWT is implemented on top of OpenMP! There are also some papers on the topic beyond LWT: Enabling Ada and OpenMP runtimes interoperability through template-based execution - ScienceDirect

I would LOVE for Ada to become OMP compatible, I really do. For the exact same reason you point out, imagine running Ada on your GPU! (Well… Actually… Some people have done that years ago GitHub - AdaCore/cuda) But afaik, the way that Ada has been tied to OpenMP is by importing OMP function calls using the C calling convention and then running those binded functions. This method does not scale and does not allow for OMP to take place in all of its glory… Not by far… The source code is not annotated, it is not lowered to IR and it is not compiled. Currently, Ada is just binded with the OMP lib and that is it… That is quite sad…

But there is GNAT-LLVM and I have been playing with it. It would be lovely if we added an aspect (can be compiler dependent) that would be something like with OpenMP => (block..., private x, y....) and that annotation would be lowered to IR just like comments happen in C/C++/Fortran… The same could be done with OpenACC too if preferred!

Best regards,
Fer

1 Like

@Richard-Wai does this use io_uring at all?

How is the submission going?

AFAIK, this is about raw parallelism and, afaik, the current implementation (lwt) is using OpenMP. However, the Ada standard does not indicate what the underlying implementation has to be, so all options are open!

Best regards,
Fer

1 Like