Parallel for-loops in Ada

leo-brewin · April 1, 2025, 8:27am

I’ve created a GitHub repository describing my experiments in using Ada’s tasking tools to run for-loops across multiple cpu cores. There are a handful of example codes using rendezvous calls and protected objects. I’m posting the link here as it might be useful for others wanting to play this game.

You can find the repository at github.com/leo-brewin/ada-parallel-for-loops.

You might also like to look at LWT, the Light Weight Threading library, from the ParaSail project. This is a much more serious attempt at this game than my little effort.

The LWT sources can be found here.

See also the announcement of LWT on the Ada-forum here
Lightweight Parallelism library based on Ada 2022 features.
A copy of the LWT sources can also be found in my repo (under examples/lwt).

Cheers,
Leo

sttaft · April 1, 2025, 12:47pm

Thanks for posting this! It’s great to see this sort of use of parallelism.

Also thanks for the nice references to the LWT library. One goal of the LWT library is to avoid creating “heavy weight” tasks more frequently than necessary, but instead reusing them. Your timings show that creating and finalizing tasks is probably not a significant time sink, so long as you do enough work with them. The other goal of the LWT library was to support the parallel execution of a heterogeneous collection of light-weight threads. It sounds like that is not too important in your application area, since you are mainly focused on parallel loops, rather than something that might need a more general divide-and-conquer approach.

Irvise · April 1, 2025, 8:40pm

THANK YOU!

I have always been missing some benchmarks on Ada’s tasking system and its (performance) behaviour! And also how it compared to other common parallelism models. LWT, afaik, uses OpenMP behind the scenes, which to me, it the parallelism standard (for a single CPU chip). It is great to see that tasks perform quite well! So thank you very very much for the tests

Best regards,
Fer

P.S.: as a LaTeX aficionado and I see you also use Cadabra, you just got a new follower ^^

sttaft · April 10, 2025, 1:20am

Actually, LWT uses either OpenMP or a home-grown work-stealing scheduler (which is often faster than OpenMP). LWT has a “plug-in” architecture so you can insert other light-weight-thread schedulers underneath, by "with"ing the package that defines the scheduler. If you don’t plug in any of them, LWT just runs the threads sequentially.

leo-brewin · April 10, 2025, 9:03am

I have to admit that my understanding of LWT is rather limited. So my apologies if I’ve made some errors or omissions in my comments about LWT. I’m quite happy to make any changes that you think might be appropriate. Also, thanks for the nice comments in your first post :).

dmitry-kazakov · April 10, 2025, 9:47am

If I correctly understood your code and results there is no difference between standard Ada tasking performance and whatever fancy library. Is that so?

I would also thankful for any analysis of implementations, since, naively thinking, if the fancy library is backed by the OS threads, then why on earth should anybody expect a result different from Ada tasking?

Some background. I implemented job service in Ada in Simple Components and used it for parallel arbitrary precision arithmetic. In particular multiplication and exponentiation under modulo allow parallelization into multiple jobs. The result benchmarking was rather disappointing. Montgomery single tasking implementation beats parallel algorithms (8 tasks) by margin.

Interestingly a lock-free implementation of the job queue performs worse than one based on Ada protected objects.

So, I am very interested in some deeper dive into the issue.

Thanks for your work.

leo-brewin · April 10, 2025, 11:04am

Sorry for the late reply. I’m pleased that you found my stuff useful. Thank you. I was also surprised to see, on an Ada site, that you’re a fan of LaTeX and Cadabra :). You might find a few of my other GitHub repos useful (one is a tutorial on Cadabra and another that allows Cadabra code and results to be embedded in a LaTeX source).

leo-brewin · April 10, 2025, 11:15am

My examples do not use any fancy libraries (other than the two examples that use the LWT library). The codes are very simple using just standard rendezvous calls and protected objects. I’ve tried to explain my design in pdf/for-loop-tasking.pdf. That’s a summary of my own understanding of multitasking in a Ada. I’m sure most of what’s in there is well known to most people in this group. There really is nothing new in my codes. They were written for one very particular case that I had (large scale processing of arrays of floating point data).

dmitry-kazakov · April 10, 2025, 12:45pm

Thanks for explanation. My case is not much different, jobs deal with large integer arrays. Instead of rendezvous I use protected job queue. A task runs infinite loop which takes a job from the queue and then dispatches to the job’s Execute operation. Synchronization is performed by the protected entry call on the job status (completed/failed).

I hoped that the light weight threading term has some substance, e.g. not just user managed threads, but something really faster.

Lucretia · April 10, 2025, 7:48pm

I thought OpenMP can be compiled for SMP as well as GPU.

Irvise · April 13, 2025, 12:24pm

Yes, modern OpenMP and OpenACC allow for device offload. I wish that the Parallel directive in Ada would have been more tightly integrated to those “frameworks”… There is no reason why it cannot, but it is a lot of work. From what I saw in LWP, it just calls OpenMP procedures directly (Import => __omp_XXX), it does not abstract itself over its model Therefore, the OpenMP model that LWT uses is a very very limited form of OpenMP.

sttaft · April 23, 2025, 6:21pm

Actually, there is no “direct” use of OpenMP in the LWT library. Everything is through a “plugin” architecture. There are currently effectively three plugins provided, one, which is the default, just runs things sequentially. A second uses the basic features of OpenMP to implement the plugin. The third uses pure Ada, plus a work-stealing scheduler implemented in Ada. Anyone could write an additional plugin, including one which takes advantage of OpenMP “device offloading” features, etc.

There was some question whether you could expect something more than is provided by Ada tasking. In fact, you can, because with Ada tasking, essentially every Ada task ends up as a rather expensive “kernel” thread. With light-weight threading, the logical threads of control are scheduled without any kernel interactions, using Ada tasks as “CPU servers” which service their work-stealing queue in a combination of LIFO for self-initiated threads, and FIFO when stealing threads from other servers. Work-stealing has been pretty extensively studied, and has been shown to provide excellent load balancing across physical CPUs, while also minimizing cache collisions between CPUs.