Interesting bug. Not sure if it makes sense to report (practically)

jere · May 16, 2025, 8:58pm

So someone from work was running into a really weird bug where their containers were seemingly getting corrupted but everything was still running. I spent some time looking at it and nailed it down to a dangling cursor to a list, which should have been caught (all checks and assertions were turned on).

So I started looking at the GNAT source for lists and the cursor vetting function is definitely not 100%, but then I’m not sure it’s possible to check 100%. So I don’t know if I report this as a bug, or just keep it in mind for the future.

The gist of the bug is that the person moved a list after taking cursors from it but forgot to invalidate the cursors to the list and still used them. They ended up swapping parts of one list into another by accident and it went pretty crazy but never raised an exception.

It’s obviously a logic error on their part, but I was hoping the compiler would catch it. I did some playing with it and depending on which dangling cursor it used, it sometimes would raise an exception, but not all of them. It just so happened the person in question got unlucky and used a cursor that didn’t raise an exception.

I can’t provide the exact code, but I hand typed out an example in godbolt that does similar things on a much smaller scale if anyone is interested:

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Containers.Doubly_Linked_Lists;

-- Type your code here, or load an example.
procedure Example is

   package Lists is new Ada.Containers.Doubly_Linked_Lists(Integer);

   v1 : Lists.List;
   v2 : Lists.List;

   c1 : Lists.Cursor;
   c2 : Lists.Cursor;

begin

   Put_Line("Hello World");

   for Value in 1 .. 10 loop
      v1.Append(Value);
      v2.Append(Value + 100);
   end loop;

   -- Get a cursor to a random element
   c1 := Lists.Next(Lists.Next(v2.First));

   -- This line causes the compiler to correctly catch the error
   -- and raise an exception. The vetting function checks head/tail
   --c1 := v2.First;  

   -- Move all elements from v2 into v1
   -- c1 now points to an element in v1 but still thinks it is in v2
   v1.Move(v2);

   -- Repopulate v2 with new elements
   for Value in 1 .. 10 loop
      v2.Append(Value + 200);
   end loop;

   -- v2 doesn't actually contain the element from c1
   Put_Line("c1 of v2 =>" & Integer'(v2.Constant_Reference(c1))'Image);

   -- Get another cursor from v2
   c2 := v2.First;

   -- Swap elements: should only work if both cursors point to elements
   -- in v2 as it calls v2.Has_Element() on both, but it sneaks by
   v2.Swap_Links(c1,c2);

   New_Line;
   Put_Line("v1:");
   for Item of v1 loop
      Put_Line(Item'Image);
   end loop;

   New_Line;
   Put_Line("v2:");
   for Item of v2 loop
      Put_Line(Item'Image);
   end loop;

end Example;

Compiled with -gnat2022 -gnata:

Compiler stdout

gcc -c -I/app/ -g -fdiagnostics-color=always -gnat2022 -gnata -I- <source>
gnatbind -x example.ali
gnatlink example.ali -g -fdiagnostics-color=always -o /app/example

Program returned: 0
Program stdout

Hello World
c1 of v2 => 103

v1:
 101
 102
 201
 104
 105
 106
 107
 108
 109
 110

v2:
 103
 202
 203
 204
 205
 206
 207
 208
 209
 210

sidisyom · May 18, 2025, 8:58pm

Following through Move for a bit, it doesn’t look like that is doing anything special for any cursors that might have been created for the source list (how could it know that anyway..). It just rearranges the First and Last pointers and sorts out the length.

So, when Swap is called the Node_Access of c1 is within the sequences of Node_Access’s of v1 but the container is still pointing to v2 and c2 is naturally a fresh cursor pointing to v2 so the container vetting passes and the two elements are swapped.

Maybe a list would benefit from some structures/logic to keep track of all the cursors that have been created off it?

jere · May 18, 2025, 9:33pm

On my own end, two potential options come to mind but both have plusses and minuses.

Each element keeps track of which container it is in. This increases space by quite a bit (access type for each element), but that can be mitigated by eliminating the access object when you disable tamper checks so the optimized version is more streamlined. This allows Move to work as the RM specifies and you detect the problem when checking the cursor.
Keep an overall count of all cursors/references exist against the container as a whole and prevent moving if that value is non zero (a list works here as well). This takes up way less space than #1 but it means you restrict the ability to Move unless no cursors or references exist, which isn’t as RM friendly. But it also makes better sense to me from a check standpoint where you want to catch the potential error as early as possible. Though it also adds more overhead to cursors when tamper checks are turned on.

I wondered if I should even report it as a bug in GNAT. It’s really more of a specific design decision on how to vet the cursors, so it feels odd to report it.

sidisyom · May 18, 2025, 9:52pm

#2 sounds reasonable (it feels similar to reference counting in some gc algorithms). Though, like you say, it really depends on what the desired behaviour is from an RM perspective i.e. should a cursor move along with a Move or should that be considered a (user) error.

OneWingedShark · May 18, 2025, 11:47pm

Hm, is it a logic error though?
Consider this, we have two lists (A, B, C, D) & (1,2,3,4), we take a few cursors —$, which references 3; #, which references C; and @ which references D— we then insert the second list into the first, yielding (A, B, C, 1, 2, 3, 4, D) — does this invalidate $? moreover, should next(#) still be D? or 1? Likewise, should pred(@) be 4, or C?

jere · May 19, 2025, 12:29am

In their case they weren’t intending for the cursor to point to the old element, so probably a logic error. That’s not to say you couldn’t craft a requirement that does want to have it still point to that element in another list, that just wasn’t their aim in this case.

On the RM side, a lot of the operations check Has_Element (Container, I) as part of their preconditions which is supposed to say if the element belongs to that specific container, so it feels like the RM expects the container change to be caught (and indeed it does catch it if it ends up being the cursor to the head or tail element, just not some of the cursors to elements in between).

Do you remember from ARG discussions what the intent would be in this case? I feel like the preconditions imply it’s an error, but I’m also only a layman in this kind of stuff.

For ref, see the precondition on Swap_Links: http://www.ada-auth.org/standards/22rm/html/RM-A-18-3.html#I7600

OneWingedShark · May 19, 2025, 2:22am

I’m sorry, I don’t remember the discussion on the containers.
Which is why I asked the question: there’s good arguments that can be made that a cursor* would “follow” a container that was included/appended into another without error… there’s also argument that it should invalidate the cursor, or otherwise be precluded.

* — Conceptually a cursor is a reference, albeit bound to the container from whence it comes. The issue then becomes that of management, bookkeeping, if you will. It may well be undefined, as a state which has not been considered.