Safety of returning an array with Address elsewhere

jcmoyer · December 13, 2024, 4:32am

I’ve also found another very curious approach used by GNATCOLL.Mmap:

subtype Unconstrained_String is String (Positive);
type Str_Access is access all Unconstrained_String;
pragma No_Strict_Aliasing (Str_Access);

Then you can take the System.Address of the buffer and convert it to a Str_Access with a normal Unchecked_Conversion. I guess that by making Unconstrained_String an array over all values of Positive you effectively have unlimited bounds, which is precisely the escape hatch I’ve been looking for. You can re-add the bounds by other means.

jere · December 13, 2024, 4:53am

Just be aware you need to be very careful there. The types you convert between must have access types that match pretty closely. Access types that point to unconstrained arrays expect bounds of specific sizes next to the memory. If you convert from one type that doesn’t even have bounds to one that does (or from a type with different sized bounds) then you’ll have invalid memory accesses which may or may not immediately manifest.

jcmoyer · December 13, 2024, 5:59pm

I am confused about the terminology here. I thought an unconstrained array was one with an unknown bound like String is array (Positive range <>) of Character where if you receive String as a parameter for example you can’t possibly know its bounds without loading them from memory.

Is Unconstrained_String (name taken directly from GNATCOLL) in this example actually an unconstrained array too? I thought it became constrained by making the range all of Positive and that it’s “unconstrained” in the sense that the actual bounds can’t be determined by the usual means like 'Range.

OneWingedShark · December 13, 2024, 6:58pm

Mm, that’s just asking for trouble; S : Unconstrained_String; is going to allocate Positive'Pred(Positive'Last)-Character elements on the stack.

This is correct, String has, as valid values, all of “”, “This”, “That”, and “This other long string.”, this ambiguity/“unknownness” [of Range/Length] prevents allocating such an object absent values or subtyping defining the constraints.

The name is very misleading; as-is, the subtype is an array of Positive'First..Positive'Last Character-elements. Just as you say.

I would not recommend using this subtype, ever.
I’m not a fan of GNATCOLL in general, and would strongly recommend considering very carefully exactly what depending on it would entail, and its implications. (Several other Ada users have complained that compiling GNATCOLL in itself can be a challenge, just as bootstrapping GNAT.)

jere · December 13, 2024, 7:46pm

The type is unconstrained, but when you allocate a specific variable of that type, the compiler will insert some sort of bounds into memory next to the array elements so that references and pointers to that memory and attributes of the object can retrieve the correct parameters (like 'First, 'Last, 'Length, etc).

If the access type that you use points to an object of an unconstrained type, the compiler knows to look for those bounds based on where the compiler would place them.

Alternately, for a constrained type:

type Bounded_String is String(1..100)`) ;
type String_Access is access Bounded_String;
Value : String_Access := new Bounded_String;

The compiler doesn’t need to put the bounds next to the object (Value in this example) because it knows all of the values at compile time. So the access type here doesn’t necessarily need to look for bounds next to memory.

So when you convert between Addresses and Access types or different access types, you need to be aware of this so you don’t inadvertantly point an Access type that expects an unconstrained array type to a constrained object with no bounds.

Similarly if the object types the two Access types refer to are technically different, the way the bounds are saved next to the object could be different (different sizes or orientation), so the conversion would could lead to using incorrect bounds.

jcmoyer · December 14, 2024, 4:20am

That is true, but to be fair I don’t think the intent is for the user to ever use Unconstrained_String directly. As far as I can tell the API is designed to allow you to view the mapped file through the access type instead solely for the convenience of being able to treat it like a String.

Thanks for clarifying.

That explains a lot. I’m not really interested in using GNATCOLL. I am mostly just curious about how much Ada lets you cut through the safety features.

Alright, that all makes sense to me. Thanks!

As far as I can tell reading disassembly (GNAT, x86-64, implementation details nobody should rely on):

When you allocate a constrained array, bounds aren’t part of the allocation. They become read-only data elsewhere in the executable exactly like @jere mentioned.

When you allocate an unconstrained array, bounds are placed directly ahead of array contents. Since index types can have different bit widths, the actual array contents may start at [ptr+8], [ptr+16], or maybe even something else.

When calling a function that takes an unconstrained array, a pointer to the array contents are passed as one parameter, and its bounds are passed as a separate parameter. Since the compiler knows at the callsite what type you’re passing, it can determine to load the bounds through the access or from rodata and how wide those loads should be. This is the key point.

All these possible variations of placement, widths, etc. explain why my earlier unchecked conversion between access Stream_Element_Array and String failed.

OneWingedShark · December 14, 2024, 5:08am

There are features that are inherently you-need-to-know-what-you’re-doing “unsafe” —even this version of ‘unsafe’ is typically less dangerous than, say, C’s you mistyped a symbol, gotcha!-style “unsafe”— like Unchecked_Deallocation (guaranteed idempotent, so there is no “double free” issue), or Unchecked_Conversion (telling the compiler use value-X of type-Y as type-Z), overlay (import+address; essentially unchecked_conversion), representation-clauses (telling the compiler “use this memory layout”), and a few other low-level features.

Insofar as memory goes, this video is excellent. I have two posts on unconstrained arrays/strings (here & here) that should be informative to one of the big, fundamental differences in Ada from other languages.

The key to really using Ada well is this: use the type-system to model your problem, then use that to solve your problem. There is a little bit of an art in designing, but that’s generally true everywhere.