How best to sort an array of records

codiac · March 21, 2025, 2:13am

Yes, I think that I was expecting it to automatically generate a “>” operator. Got it now, thanks to you, OneWingedShark, and dmitry-kazakov.

OneWingedShark · March 21, 2025, 3:19am

Oh, thank you.
I’m glad it helps.

codiac · March 21, 2025, 3:37am

In my program I’ve been working–that sparked this whole discussion–I am trying to populate an array of records. I read the file line by line and parsed each of those lines by a delimiter. I don’t know the size of the string for each delimiter so that is why I was using unbounded strings. Is there a way around this? I guess I could figure out the max size each delimited entry could potentially be and then allocate memory for that. For some reason I thought that this wasn’t going to work but it did! I thought I had to use unbounded strings because of the variable string length. I appreciate the tip, as I don’t like wasting memory/space and like to be efficient.

Open (File_Handle, Mode => In_File, Name => In_File_Name);

type Order_Entry is record
Order_Name : Wide_String(1…60); --name is usually 3 times longer that status
Order_Status : Wide_String(1…20);
end record;

Order_Array : array (Integer range 1 … 5) of Order_Entry;

while not End_Of_File (File_Handle) loop
Parsed_Field_List :=
PragmARC.Line_Fields.Parsed (Get_Line (Mega_File_Handle), ‘|’);
Order_Array (Index).Order_Name := Parsed_Field_List (1);
Order_Array (Index).Order_Status := Parsed_Field_List (2);
Index := Index + 1;
end loop;

dmitry-kazakov · March 21, 2025, 12:03pm

Here is an implementation I meant:

kept sorted
has embedded indices: by name, by status
no memory overhead

You can walk the list either in names order or status order.

The name index is used to keep elements. It ignores Status in comparisons, so name is unique.

The status index is compares first status and then name.

with Ada.Finalization;
with Generic_Set;

package Order_Lists is
   type Order_Entry (Name_Length, Status_Length : Natural) is record
      Name   : String (1..Name_Length);
      Status : String (1..Status_Length);
   end record;
   type Order_List is tagged limited private;
   procedure Add (List : in out Order_List; Name, Status : String);
   function Get_By_Name (List : Order_List; Index : Positive)
      return Order_Entry;
   function Get_By_Status (List : Order_List; Index : Positive)
      return Order_Entry;
   function Get_Size (List : Order_List) return Natural;
private
   type By_Name_Ptr is access Order_Entry;
   function "<" (Left, Right : By_Name_Ptr) return Boolean;
   function Equal (Left, Right : By_Name_Ptr) return Boolean;
   package By_Name_Sets is new Generic_Set (By_Name_Ptr, null, "=" => Equal);

   type By_Status_Ptr is access all Order_Entry;
   function "<" (Left, Right : By_Status_Ptr) return Boolean;
   package By_Status_Sets is new Generic_Set (By_Status_Ptr, null);

   type Order_List is
      new Ada.Finalization.Limited_Controlled with
   record
      By_Name   : By_Name_Sets.Set;
      By_Status : By_Status_Sets.Set;
   end record;

   overriding procedure Finalize (List : in out Order_List);
end Order_Lists;

The body

with Ada.Unchecked_Deallocation;

package body Order_Lists is
   procedure Free is
      new Ada.Unchecked_Deallocation (Order_Entry, By_Name_Ptr);

   function "<" (Left, Right : By_Name_Ptr) return Boolean is
   begin
      if Left = null then
         return Right /= null;
      elsif Right = null then
         return False;
      else
         return Left.Name < Right.Name;
      end if;
   end "<";

   function Equal (Left, Right : By_Name_Ptr) return Boolean is
   begin
      if Left = null then
         return Right = null;
      elsif Right = null then
         return False;
      else
         return Left.Name = Right.Name;
      end if;
   end Equal;

   function "<" (Left, Right : By_Status_Ptr) return Boolean is
   begin
      if Left = null then
         return Right /= null;
      elsif Right = null then
         return False;
      elsif Left.Status = Right.Status then
         return Left.Name < Right.Name;
      else
         return Left.Status < Right.Status;
      end if;
   end "<";
   function "=" (Left, Right : By_Status_Ptr) return Boolean is
   begin
      return Left.all = Right.all;
   end "=";

   procedure Add (List : in out Order_List; Name, Status : String) is
      Item : By_Name_Ptr :=
         new Order_Entry'(Name'Length, Status'Length, Name, Status);
      begin
      if List.By_Name.Is_In (Item) then
         Free (Item);
         raise Constraint_Error with Name & " is already in the list";
      end if;
      List.By_Name.Add (Item);
      List.By_Status.Add (Item.all'Unchecked_Access);
   end Add;

   function Get_By_Name (List : Order_List; Index : Positive)
      return Order_Entry is
   begin
      return List.By_Name.Get (Index).all;
   end Get_By_Name;

   function Get_By_Status (List : Order_List; Index : Positive)
      return Order_Entry is
   begin
      return List.By_Status.Get (Index).all;
   end Get_By_Status;

   function Get_Size (List : Order_List) return Natural is
   begin
      return List.By_Name.Get_Size;
   end Get_Size;

   procedure Finalize (List : in out Order_List) is
      Item : By_Name_Ptr;
   begin
      for Index in 1..List.By_Name.Get_Size loop
         Item := List.By_Name.Get (Index);
         Free (Item);
      end loop;
   end Finalize;

end Order_Lists;

dmitry-kazakov · March 21, 2025, 12:18pm

This is a test program. It is a bit long because it contains full error diagnostics. It also illustrates how to avoid tokenizing, i.e. there is never any need to determine fields or to copy them, even if the field is a string.

It reads file and prints the content sorted by status.

Note, there is no End_Of_File test.

with Ada.Text_IO;            use Ada.Text_IO;
with Ada.Command_Line;       use Ada.Command_Line;
with Ada.Exceptions;         use Ada.Exceptions;
with Strings_Edit;           use Strings_Edit;
with Strings_Edit.Integers;  use Strings_Edit.Integers;

with Order_Lists;

procedure Test is

   function Where (Line, Column : Positive) return String is
   begin
      return Image (Line) & ':' & Image (Column);
   end Where;

   File   : File_Type;
   Orders : Order_Lists.Order_List;
   Count  : Natural := 0;
begin
   case Argument_Count is
      when 1 =>
         begin
            Open (File, In_File, Argument (1));
         exception
            when others =>
               Put_Line ("Cannot open file: " & Argument (1));
               Set_Exit_Status (Failure);
               return;
         end;
      when others =>
         Put_Line ("Usage: test <file-name>");
         Set_Exit_Status (Failure);
         return;
   end case;
   begin
      loop
         Count := Count + 1;
         declare
            Line         : constant String := Get_Line (File);
            Pointer      : Integer := Line'First;
            Name_Start   : Integer;
            Name_Stop    : Integer;
            Status_Start : Integer;
            Status_Stop  : Integer;
         begin
            Get (Line, Pointer, SpaceAndTab); -- Skip blanks
            if Pointer > Line'Last then
               raise Data_Error with
                     "Name is expected at " & Where (Count, Pointer);
            end if;
            Name_Start := Pointer;
            Name_Stop  := Pointer - 1;
            while Pointer <= Line'Last loop
               case Line (Pointer) is
                  when '|' =>
                     Pointer := Pointer + 1;
                     exit;
                  when ' ' | Character'Val (8) =>
                     null; -- Truncate trailing blanks
                  when others =>
                     Name_Stop := Pointer;
               end case;
               Pointer := Pointer + 1;
            end loop;
            if Name_Start > Name_Stop then
               raise Data_Error with
                     "Name is empty at " & Where (Count, Name_Start);
            end if;
            Get (Line, Pointer, SpaceAndTab);
            if Pointer > Line'Last then
               raise Data_Error with
                     "Status is expected at " & Where (Count, Pointer);
            end if;
            Status_Start := Pointer;
            Status_Stop  := Pointer - 1;
            while Pointer <= Line'Last loop
               case Line (Pointer) is
                  when '|' =>
                     raise Data_Error with
                        "Unexpected field at " & Where (Count, Pointer);
                  when ' ' | Character'Val (8) =>
                     null;
                  when others =>
                     Status_Stop := Pointer;
               end case;
               Pointer := Pointer + 1;
            end loop;
            if Status_Start > Status_Stop then
               raise Data_Error with
                     "Status is empty at " & Where (Count, Status_Start);
            end if;
            Orders.Add
            (  Line (Name_Start..Name_Stop),
               Line (Status_Start..Status_Stop)
            );
         end;
      end loop;
   exception
      when End_Error =>
         Close (File);
      when Error : others =>
         Put ("Line " & Image (Count) & " error: ");
         Put (Exception_Information (Error));
         New_Line;
         Close (File);
         Set_Exit_Status (Failure);
         return;
   end;
   for Index in 1..Orders.Get_Size loop
      Put ("Name: " & Orders.Get_By_Status (Index).Name);
      Put (", Status: " & Orders.Get_By_Status (Index).Status);
      New_Line;
   end loop;
   Set_Exit_Status (Success);
exception
   when Error : others =>
      Put_Line ("Error:" & Exception_Information (Error));
      Set_Exit_Status (Failure);
end Test;

OneWingedShark · March 22, 2025, 4:08am

Sure; the way to get around this is to lean into the nature of the file you already have: that is, you can use a fixed-string as-needed.

As an example:

Package Example is
  Type Names(First_Length, Last_Length : Natural) is record
    First_Name : String(1..First_Length);
    Last_Name  : String(1..Last_Length);
  end record;
  Package Name_List is new Ada.Containers.Indefinite_Vector
     ([...] Element_Type => Names [...]);

  Function Get_Names(Input : in out File_Type) return Name_List.Vector;
End Example;

And we can stash all the “ugly bits” in the implementation:

Package Body Example is
  -- Note the nice interface.
  Function Get_Names(Input : in out File_Type) return Name_List.Vector is
    -- And here we make advantageous use of defaulting.
    Function Parse(Text : String := Get_Line(Input)) return Names is
      Delimiter : Constant String:= ", ";
      Comma     : Natural renames Ada.Strings.Fixed.Index
                         ( Pattern => Delimiter, Source => Text );
    Begin
      -- We assume a form of LASTNAME, FIRSTNAME
      if Comma not in Positive then
        Raise Constraint_Error with "Malformed line: " & Text;
      end if;

      From_Form;
      Declare
        Last  : String renames Text( Text'First..Natural'Pred(Comma) );
        First : String renames Text( Comma+Delimiter'Length..Text'Last );
      Begin
        Return Names'( First_Length => First'Length, Last_Length => Last'Length,
                       First_Name   => First,        Last_Name   => Last );
      End From_Form;
    End Parse;
  Begin
    Return Result : Name_List.Vector:= Name_List.Empty_Vector do
      Loop
        Result.Append( Parse ); -- We just keep appending new names until we run out.
      End Loop;
    Exception
      when End_Error => Close_File( Input );
    End return;
  End Get_Names;
End Example;

And that is how you can avoid using Unbounded_Strings, how you can “stash” tricky/ugly bits, and how you can leverage (a) renames, (b) defaults, and (c) exceptions to make the core-logic straightforward.

codiac · March 22, 2025, 7:47pm

This is a lot to digest. I haven’t had the opportunity to try out this code, but I appreciate it. At first glance, it seems to add a lot more complexity to my code. Tokenizing (parsing table data into a record) seems to be more intuitive and easy to read, but I’m open to exploring this new approach. I’d be interested to see the memory overhead and compare my implementation to yours and see how much it reduces the memory footprint and processing power. I don’t know the best way to benchmark my code. I was going to start another thread on that, like how to see the memory usage of an Ada executable. One thing I found out last week is that my antivirus program goes beserk when running the Ada executable, so that is why my processor was loaded so high. I thought it was because I wrote a really poorly coded program.

dmitry-kazakov · March 22, 2025, 9:29pm

Tokenizing parses into strings. You have to do a second pass to reparse strings, e.g. trimming or extracting numbers. Then you need to add error handing. Only then you can judge readability,

This is the problem with most code that uses libraries. They get you 80% there and the rest 20% requires total redesign. Be it tokens or containers. The worst thing ever is of course XML, JSON and similar meaningless stuff. It is OK for run-once programs, though.

Interesting, but unlikely. Unless you have to parse megabytes of text data, e.g. log files, then tokenizing and Unbounded_String can really bring the CPU to its knees.