Hi all,
I am testing the power of the Ada type system, new CPU functions and the compiler’s ability to optimize code. I have a AMD Zen4 CPU and it supports AVX512 and BFloat16 instructions. I wanted to see if GCC could generate automatic data structures and optimizations using those newer features.
I have been using Compiler Explorer and GCC 15 to run these tests. You can find the code and setup here. The code is
pragma Source_File_Name (Square, Body_File_Name => "example.adb");
pragma Ada_2022;
function Square(num : Integer) return Float is
type My_Smol_Float is digits 6; -- CHANGE ME!
type Index is range 1..300; -- CHANGE ME!
Floaty1 : constant array (Index) of My_Smol_Float := (others => My_Smol_Float(num));
Floaty2 : constant array (Index) of My_Smol_Float := (others => My_Smol_Float(num**2));
Floaty_All : array (Index) of My_Smol_Float;
Accumul1, Accumul2, Accumul3 : My_Smol_Float := 0.0;
begin
for N of Floaty1 loop
Accumul1 := @ + N;
end loop;
for I in Floaty1'Range loop
Accumul2 := @ + Floaty1(I) + Floaty2(I);
end loop;
for I in Floaty1'Range loop
Floaty_All(I) := Floaty1(I) * Floaty2(I);
end loop;
Accumul3 := Floaty_All'Reduce("*", 0.0);
return Float(Accumul1 + Accumul2 + Accumul3);
end Square;
I was testing the difference between the different optimization levels and options.
I was mainly looking at:
-O2 -gnatp-O3 -gnatp-O2 -march=znver4 -mtune=znver4 -gnatp-O3 -march=znver4 -mtune=znver4 -gnatp -fopt-info-vec-all
The compilers output of the last option setup is the one I care the most. Here is what I found.
- GCC 15 fails at optimizing the
'Reduceoperation. The-fopt-info-vec-alldoes provide some hints, but I am not able to use them in any meaningful way as I do not understand these things all that well. - GCC is able to generate half floats for low precision Floats. Though I think it is not able to generate 8-bit or 4-bit floats. It is neither able to generate BFloat16.
- GCC can do half float streaming and easily optimizes and vectorizes loops for SIMD without a problem (even with
-O2). - GCC unrolls the loops into SIMD operations if the amount of data the
arrayis not too much, nice. - GCC, while it is able to recognise that
6 digitsof precision fit a half float, it does not generate any SIMD arithmetic instructions specialized for it. It generatesvaddsswhich is for single precision floats, instead of using the more advance AVX2-FMA or AVX512FP16 arithmetic instructions (wiki info)…- This is strange as it clearly understand that they are half precision and that the arithmetic operations that I am doing are available in my hardware… Is it one of these cases where Ada’s lack of undefined behaviour prevents some operations from being generated? I know that GNAT will not generate some instructions if the hardware implementation has some UB that is incompatible with Ada’s guarantees.
Summary:
- Point 5 is the one that I was hoping would be automatically optimized. This would create a huge performance gain automagically.
- It is nice to see that GCC can automatically create half-floats from Ada’s code.
Next episode: try the GNAT-LLVM compiler ![]()
Cheers,
Fer
EDIT: added link to some basic docs for AVX512 and deleted unneeded optimization flag.