[Solved] Improving performance on a vector: help needed

cantanima · May 30, 2024, 8:54am

Hello

I’m trying to implement the Ulam Number task on Rosetta Code. I have a working implementation but it is much, much slower than the usual suspects, despite being identical (as far as I can tell).

Investigation suggests this is due to the following loop:

for Ith in 1 .. Numbers.Last_Index - 1 loop
   Sieve (U + Numbers (Ith)) := Sieve (U + Numbers (Ith)) + 1;
end loop;

At one point, this loop takes 9 seconds on Ada, compared to a fraction of a second for the usual suspects. Here is the comparable code for one of the usual suspects:

        for (int i = 0; i < ulen - 1; ++i)
            ++sieve[u + ulams[i] - 1];

The entire program takes Ada a little more than 9 minutes, while the usual suspect takes 30 seconds. I understand this isn’t semantically equivalent, but a ~20x slowdown seems like a problem.

Some data that may or may not be relevant:

Numbers and Sieve are both Ada.Containers.Vector types whose elements are Positive.
The compiler tells me I can’t use Ada 2022’s @ symbol in the assignment because the left hand side evaluates to a reference. That reminds me that vectors aren’t arrays, and makes me suspect a lot of overhead behind the scenes.

Some ways I tried to mitigate this:

Change the loop to
```
for Number of Numbers loop
   Sieve (U + Number) := Sieve (U + Number) + 1;
end loop;
Sieve (U + Numbers.Last_Element) :=
            Sieve (U + Numbers.Last_Element) - 1;
```
That by itself cut the runtime of the one run of the loop from 9 seconds to 5 and confirms my suspicions at least partly.

(As an aside, it would be nice if there were a way to specify that I want a slice of the vector, so that I don’t have to add in that post-loop assignment – and I do have to add that to maintain correctness. We can do that with arrays; is there a way to do it with vectors?)
Use an array instead of a vector, reassigning the array on each iteration. I’m not even sure that would help, but even if it did, I must be working on this too late at night, because I know I’ve done something like this before, but:
- When I try to overwrite the old Sieve with the new one, a runtime error arises, I imagine because the old Sieve was initialized with a fixed size different from the new one.
- I tried using a discriminated type where the array’s length was the discriminant, but that gave me the same trouble.

Questions:

Am I using the vector incorrectly above? Is there a better library function to access and modify an element?
Can someone point me to an example where a discriminated record is used to create and reassign arrays, they way I’m sure I remember doing once before? or correct my memory by saying that cannot, in fact, be done?
I tried to initialize an array using the new Ada 2022 iterator approach, something like this:
```
--  Old_Size is the value of Old_Values, of smaller dimension than the one we're generating
[ for Ith in 1 .. Old_Size => Old_Values (Ith), others => 0 ]
```
…but the compiler complained that I had to use either a dynamic initialization or something else (can’t quite remember what). I tried to look this up in the ARM but didn’t see what it meant.
Suppose I want to intiialize a vector with a lot of 0’s. That doesn’t seem to be possible? That is, a vector is initialized and resized via Set_Length, but there’s no way to say, “I want all the values to be …”?
Any advice on eliminating the remaining bottleneck?

I haven’t posted the code anywhere online yet, but can if necessary.

jere · May 30, 2024, 12:30pm

Vectors do have a lot of internal checks (less so than other containers though), so you’ll probably need to disable them. I don’t have experience on how to do that, but I know that there is some mechanism for it. May have to look at the GNAT RM to see or take a look at their source code for the vector package and see if it is something obvious.

One thing I would recommend is to try the more obtuse way of Replace_Element and Element operations rather than indexed assignment. Indexed assignment generates a “reference type” which may have finalization happening in the background. Also use the versions that take indexes in rather than cursors (to avoid the extra cursor validity checks).

So Sieve (U + Number) := Sieve (U + Number) + 1;

becomes

Sieve.Replace_Element(U+Number, Sieve.Element(U+Number)+1);

That could remove a lot of the checks, but you are also duplicating the addition, so you might also consider doing the addition of U + Number just once and using a variable for the indexes. Note though, that it all depends on how GNAT optimizes the indexed assignment method, so I don’t know which is faster.

You still have checks for the U+Number potentially, so if you know they will never fail, you can optimize it with pragma Suppress

Yeah, you may need to use an indefinite holder to do that or do you own dynamic memory handling. I tend to prefer the indefintie holder since it does all the work for you, but it may be too slow depending on how you are using it.

function To_Vector
     (New_Item : Element_Type;
      Length   : Count_Type) return Vector
      with Pre  => Length <= Maximum_Length or else raise Constraint_Error,
           Post =>
              To_Vector'Result.Length = Length and then
              not Tampering_With_Elements_Prohibited (To_Vector'Result)
                and then
              not Tampering_With_Cursors_Prohibited (To_Vector'Result)
                and then
              To_Vector'Result.Capacity >= Length;

joakim-strandberg · May 30, 2024, 8:51pm

Tampering checks are known to cause some overhead. I would recommend suppressing them:

Replacing the standard vector implementation in the standard library with a vector implementation from the Ada traits containers might give better performance as well:

egilhh · May 30, 2024, 9:17pm

with Ada 2022, you could get a Stable view of the container instead.
Search for “stable” in the latest RM:
http://www.ada-auth.org/standards/2yrm/html/RM-A-18-2.html

pyj · May 30, 2024, 9:51pm

With performance it’s also best to check your compile settings and try a few profiling tools. Without the code and a profile I can’t suggest any improvements.

I would recommend running with Linux’s perf, or depending on your hardware, VTune or get a flame graph with AMD’s uProf.

cantanima · May 30, 2024, 11:18pm

Thanks to everyone. I had forgotten about suppressing checks. I didn’t even know you could suppress tampering checks. (Talk about something I’d have abused very much in the past…! )

Suppressing merely the tampering checks got us to 130 seconds.
Suppressing all checks got us down to roughly 90 seconds.

This was with -Og, so I also tried it with -O2.

Suppressing merely the tampering checks got us to 104 seconds.
Suppressing all checks got us down to roughly 20 seconds.

Nice! I’ll reply inmore detail later.

jere · May 30, 2024, 11:34pm

I don’t know if you blog at all, but it might be a fun article to write about and can show your experience tackling it.

cantanima · May 31, 2024, 2:16am

Alas, I have to resize this container on occasion, unless I know its final size in advance, which I don’t. That’s an interesting tool for future work, though.

cantanima · May 31, 2024, 2:18am

The tech lead at work uses flame graphs with Rust. I think he uses a package pre-baked for llvm. I wasn’t aware of uProf, and I’ve wanted to look more into this. Thanks.

egilhh · May 31, 2024, 6:20am

Then destroy (finalize) the stabilized view before resizing.

cantanima · May 31, 2024, 11:36am

I see what you mean. I hadn’t read far enough into it. Thanks; I will explore that.

JC001 · May 31, 2024, 11:48am

My initial approach would be to use Iterate over Numbers, and apply Update_Element to the corresponding element of Sieve. You’d probably need some extra effort to avoid processing the last element of Numbers.

egilhh · May 31, 2024, 12:53pm

Actually, don’t bother… I just tried to construct an example to post here, and it seems the Stable container view is another Ada 2022 feature not (yet?) implemented in GNAT