N to M Element Array Iteration in Ada

alexmanning9904 · December 12, 2024, 7:13am

Howdy all - I’m currently pretty new to Ada and working on some sample projects to teach myself. As part of this, I’m trying to build a subprogram to convert an array of bytes to a string of Base64 Characters per the algorithm specified in RFC4648, which operates on groups of 3 bytes to produce 4 characters. I have functional code for this (ignoring padding bytes), but I already feel that it is very inelegant, and I’m also having a lot of trouble getting it to pass gnatprove. I’d like to focus on the inelegancies for now. Here is what I have come up with:

function Bytes_To_B64_String (Input : Bytes) return B64_String is
      Input_Iterator  : Integer range Input'First .. Input'Last + 1 := Input'First;
      Output          : B64_String (1 .. (Input'Length / 3 * 4)) := (others => '='); -- Output String
      Output_Iterator : Integer range Output'First .. Output'Last + 1 := Output'First;

      Byte_Trio : Bytes (1 .. 3) := [others => 0];
      Byte_Quad : Bytes (1 .. 4) := [others => 0];
      B64_Quad  : B64_String (1 .. 4) := [others => '='];
   begin

         while Input_Iterator <= Input'Last - 2 loop

            Byte_Trio (1) := Input (Input_Iterator);
            Byte_Trio (2) := Input (Input_Iterator + 1);
            Byte_Trio (3) := Input (Input_Iterator + 2);

            Byte_Quad (1) := Shift_Right (Byte_Trio (1), 2) and 2#00111111#;
            Byte_Quad (2) := (Shift_Left (Byte_Trio (1), 4) or Shift_Right (Byte_Trio (2), 4)) and 2#00111111#;
            Byte_Quad (3) := (Shift_Left (Byte_Trio (2), 2) or Shift_Right (Byte_Trio (3), 6)) and 2#00111111#;
            Byte_Quad (4) := (Byte_Trio (3) and 2#00111111#);

            B64_Quad (1) := Byte_To_B64_Character_Lookup (Byte_Quad (1));
            B64_Quad (2) := Byte_To_B64_Character_Lookup (Byte_Quad (2));
            B64_Quad (3) := Byte_To_B64_Character_Lookup (Byte_Quad (3));
            B64_Quad (4) := Byte_To_B64_Character_Lookup (Byte_Quad (4));

            Output (Output_Iterator) := B64_Quad (1);
            Output (Output_Iterator + 1) := B64_Quad (2);
            Output (Output_Iterator + 2) := B64_Quad (3);
            Output (Output_Iterator + 3) := B64_Quad (4);

            Input_Iterator := Input_Iterator + 3;
            Output_Iterator := Output_Iterator + 4;
         end loop;

      return Output;
   end Bytes_To_B64_String;

My first primary complaint in this is the use of a while loop to iterate over elements. Is there Ada syntax that will allow me to iterate over the input array in groups of 3? In C, for example,
for (int i = 0; i < len(foo); i += 3).

Second, I am not happy that I gave Input_Iterator and Output_Iterator ranges of N+1 to avoid overflows when they are incremented in the last execution of the loop. If I can’t avoid the for loop, is there at least a clean way to avoid this that I’m not seeing?

Thanks in advance for any help; I’m enjoying working with Ada, but I still think I’m fighting the language a bit.

JC001 · December 12, 2024, 12:07pm

Some general comments first: At the Ada Launch (1980-12-10), Ichbiah said that Ada included while solely to facilitate translation from languages such as Pascal, but it should never be used in new code. There are two psychological reasons for this: People find the exit condition more intuitive than the continuation condition (I know it took me a while to wrap my brain around having to invert the obvious condition to write while loops), and while loops often involve negative logic, which is harder to understand than positive logic.

while not EOF loop
-- compared to
loop 
   exit when EOF;

Ichbiah went on to violate this rule in published examples of Ada, and my personal rule is to use whatever is easiest to read, so use exit when EOF; but also while More_Data loop. In your case there’s not much difference in the two conditions, so the “prefer exit to while” rule should probably apply.

V : T; is read “V is a T”.

I : Integer; -- I is an Integer
F : Float;   -- F is a Float
B : Bytes;   -- B is a Bytes

“B is a Bytes” is nonsense, which is why you should never use plurals for type names. What you’re calling “Bytes” is a sequence of bytes, and other words for “sequence” include “list” and “string”, so I’d call this “Byte_List”.

Since your work variables (Byte_* and B64_Quad) are always assigned to before being read, they need not be initialized, though perhaps that’s needed to keep SPARK happy.

Regarding your first question, no, Ada does not have a counting loop with steps other than 1 or -1. Having seen some of the perversions that C for loops have been used for, this is generally a good thing.

Regarding your second question, I’d just use Positive. I presume you have a precondition that results in Output’Last < Integer’Last.

Regarding “inelegance”, much of what you’re calling inelegance stems from the fact that you’re doing everything in one subprogram: calculating the output length, keeping track of where you are in Input and where to put things in Output, converting 3 bytes into 4 characters. Some more comes from not using some language features when possible:

Byte_Trio := Input (Input_Iterator .. Input_Iterator + 2);
...
Lookup : for i in Byte_Quad'range loop
   Output (Output_Iterator + I - 1) := Byte_To_B64_Character_Lookup (Byte_Quad (I) );
end loop Lookup;

Breaking things up should make your code clearer and easier to get right. I’d probably do something like

subtype Triplet is Byte_List (1 .. 3);
subtype Quad is B64_String (1 .. 4);

function To_B64 (Trio : in Triplet) return Quad;
-- Implementation left as an exercise for the reader

function As_B64 (List : in Byte_List) return B64_String with
   Pre  => 4 * List'Length / 3 < Integer'Last,
   Post => As_B64'Result'First = 1 and As_B64'Result'Length =  4 * List'Length / 3;

function As_B64 (List : in Byte_List) return B64_String is
   (if List'Length < Triplet'Length then ""
    else To_B64 (List (List'First .. List'First + Triplet'Length - 1) ) &
         As_B64 (List (List'First + Triplet'Length .. List'Last) ) );

I have let the compiler keep track of many of the things that you’re explicitly dealing with. This simplifies the code.

Adding padding to List to get a length that’s a multiple of 3 will complicate things a bit, mostly in the pre- and postconditions.

OneWingedShark · December 12, 2024, 4:09pm

I have an implementation for Base64 here. Do note, this does have some design that was working around bugs in the compiler.

You can use for, in fact, I’d say that for conversions you want to have a good model of the source and target types, and then work from there — remember, in Ada the best way to use the language is to use the type-system to model your problem, then use that to solve your problem.

In the link provided, there’s a charact

There are several methods that could be used, perhaps the cleanest would be to flag if/how-much post-operation doesn’t “fit” into the loop’s operation, and make a subtype for that, special-casing the “cleanup” as needed:

-- Return the input, but with all but the first trailing period replaced with '$'.
Function Example( S: String ) return String is
   Use Ada.Strings.Fixed;
   Function Get_Tail return Natural is
   Begin
     if S'Length not in Positive then Return 0;
     elsif S(S'Last) /= '.' then Return 0;
     else
        Return Result : Natural := S'Last do
           while Result in Natural and then S(Result) = '.' loop
               Result:= Natural'Pred(Result);
           end loop;
           -- Fixup for including single period.
           Result:= Natural'Succ(Result);
        end return;
     end if;
   End; 

   Tail : Natural renames Get_Tail;
    Subtype Internal_Index is Positive range S'First..(if Tail in Positive then Tail else S'Last);
Begin
   Return Result : String(S'Range):= S(Internal_Index) & (Tail+1..S'Last => '$');
End Example;

It’s a bit contrived, but shows how you can use subtypes, and slicing, to construct an answer.

simonjwright · December 12, 2024, 5:42pm

While playing around with this I found that

   subtype Base_64_Character is Character with
       Static_Predicate =>
        Base_64_Character in
          'A' .. 'Z' | 'a' .. 'z' | '0' .. '9' | '+' | '/' | '=';

   Base_64_Alphabet_Base : constant array (0 .. 64) of Base_64_Character :=
     (0  => 'A', 1 => 'B', 2 => 'C', 3 => 'D', 4 => 'E', 5 => 'F', 6 => 'G',
      7 => 'H', 8 => 'I', 9 => 'J', 10 => 'K', 11 => 'L', 12 => 'M', 13 => 'N',
      14 => 'O', 15 => 'P', 16 => 'Q', 17 => 'R', 18 => 'S', 19 => 'T',
      20 => 'U', 21 => 'V', 22 => 'W', 23 => 'X', 24 => 'Y', 25 => 'Z',
      26 => 'a', 27 => 'b', 28 => 'c', 29 => 'd', 30 => 'e', 31 => 'f',
      32 => 'g', 33 => 'h', 34 => 'i', 35 => 'j', 36 => 'k', 37 => 'l',
      38 => 'm', 39 => 'n', 40 => 'o', 41 => 'p', 42 => 'q', 43 => 'r',
      44 => 's', 45 => 't', 46 => 'u', 47 => 'v', 48 => 'w', 49 => 'x',
      50 => 'y', 51 => 'z', 52 => '0', 53 => '1', 54 => '2', 55 => '3',
      56 => '4', 57 => '5', 58 => '6', 59 => '7', 60 => '8', 61 => '9',
      62 => '+', 63 => '/', 64 => '=');

will make the compiler fail with Stack_Overflow (if you wait long enough).

Likewise with GNAT’s extension, plain Predicate.

Using Dynamic_Predicate is OK.

sidisyom · December 15, 2024, 7:51pm

If dynamic predicates in loops were allowed then the equivalent of that for-loop could be written somewhat more elegantly (imo) like so:

procedure Triplets_Looper is
   subtype Triplet is Natural
     with Dynamic_Predicate => Triplet mod 3 = 0;
begin
   for T in Triplet loop
      exit when T >= Len (Foo);
      --   logic here
   end loop;
end Triplets_Looper;

but that isn’t allowed (only Static_Predicate allowed in loops).

So, one option is to hand-roll that:

for N in Natural loop
    exit when N >= Len (Foo);
    if N mod 3 = 0 then
      null; --   logic here
    end if;
end loop;

cantanima · December 15, 2024, 9:17pm

How about this?

for N in Natural when N mod 3 = 0 loop
    exit when N >= Len (Foo);
    null; --   logic here
end loop;

(I haven’t tested it.)

If Ada had something like Python’s enumerate this would be something more generally usable.