Initializing an array of Strings

17319184553595849220700376890807

Having read this thread I was wondering about the most efficient way to define an array of records with more than one variable string component. The following example works but looks quite clunky to me. (In reality the strings would be read in from a file instead of initialised as below.)

with Ada.Text_IO;
use  Ada.Text_IO;

procedure Test is

   subtype Length is Positive range 1 .. 20;

   type String_Record (Size1 : Length := 10;
                       Size2 : Length := 10) is
     record
        S1 : String (1 .. Size1);
        S2 : String (1 .. Size2);
     end record;

   type String_Array is array (1 .. 5) of String_Record;
   Strings : String_Array;

begin

   Strings (1) := (Size1 => 6, S1 => "purple",
                   Size2 => 3, S2 => "sky");
   Strings (2) := (Size1 => 4, S1 => "blue",
                   Size2 => 5, S2 => "river");

end Test;

Would I be better just sticking with Unbounded Strings? The total data read in is going to be a few hundred KB at most.

The question is why do you need an array of such records? Moreover, why do you need it initialized. Normally it is rather some container modified incrementally.

Otherwise, simply define a constructing function:

   function "/" (Left, Right : String) return String_Record is
   begin
      return (Left'Length, Right'Length, Left, Right);
   end "/";

Then use it like this:

   Strings := ("purple"/"sky", "blue"/"river");

P.S. A constrained String_Record may be even worse than Unbounded_String.

1 Like

A different way to approach it is to use indefinite vectors:

with Ada.Text_IO; use Ada.Text_IO;  
with Ada.Containers.Indefinite_Vectors;

procedure Test is

	package String_Vectors is new Ada.Containers.Indefinite_Vectors
		(Index_Type   => Positive,
		 Element_Type => String);
	subtype String_Vector is String_Vectors.Vector;  -- local rename for ease of use
	use type String_Vector; -- make operators like "&" visible
	
	type String_Record is
	record
		Strings : String_Vector := String_Vectors.Empty_Vector
			& "Hello"			-- This will be S1
			& (1..10 => <>);	-- This will be S2
	end record;
	S1 : constant Positive := 1;
	S2 : constant Positive := 2;
	
	v1 : String_Record;
	
begin
	v1.Strings(S2) := "0123456789"; -- range checked to ensure correct length
	Put_Line(v1.Strings(S1));
	Put_Line(v1.Strings(S2));
	
	-- If you want to change the size of one of the strings:
	v1.Strings.Replace_Element(S2, "World");
	Put_Line(v1.Strings(S1));
	Put_Line(v1.Strings(S2));
end Test;

I’d normally use something like that for sequential sequences like days of the week, months of the year where you’d typically want to write day(dd) or month(mm). instead of going through massive case statements.or having to call a function.

The other thing I’d use it for is multilingual GUIs. There is the added complication of adding enums as well (eg str (E_what_day) ). All methods are OK it you never have to maintain the code. The problem comes when

a. A new string is added
b. A new language is added

Everything is easy when the number of strings is small (say 5) but when there are lots of strings, (typically 50 to 100 for a simple system) and the modifications are dotted everywhere, then it gets difficult.

For multilingual strings you will want to avoid fixed length strings and go to some sort of unbounded string (or string holder). There’s very little way to guarantee that a word that fits in 10 letters in one language will fit in 10 letters in another language.

Why strings rather than Ada.Calendar types?

I do this outside the code. Normally a GUI framework has means for this. E.g. in GTK I use CSS file to define text constants with English fall backs.

In any case these are static strings. So you can simply use access to String.

I have a C++/C#/Python background. I’d use variable length strings for such things in those languages. It would possibly be the wrong choice for Ada and possibly other languages but if I wanted to do something quickly and didn’t know a lot about the libraries, that is what I would do.

Yes, you just do not have a choice there. None of these languages have strings (arrays) as integral objects, e.g. ones you could marshal, put in a shared memory etc. Ada’s equivalent of these is Unbounded_String. So you can choose.

As for libraries it is exactly reverse. Ada strings need no libraries to handle. On the contrary Unbounded_String is a library thing, same is true for C++/C#/Python.

Major problems with Unbounded_String are:

  • lack of array interface (index, literals, aggregates, slices)
  • use of memory pool which makes them problematic in tasking (need of locking) and initialization (a string constant can be put into initialized section)
  • sliding indices. Unbounded_String is always 1-based, which makes some text processing algorithms difficult. String index does not slide when you take a slice.

The “most efficient way” is whatever meets your quantified timing requirements while minimizing the effort needed by the developer. For most problems/platforms/compilers, this will be Unbounded_String.

Not really. Unbounded_String might ease efforts to implement something but definitely complicates its use. Unbounded_String have a very uncomfortable interface. In fact most cases involve multiple conversions of Unbounded_String to String in order to be able to do anything useful with it.

1 Like

I could use a collection of individual arrays. My data is a collection of items with attributes which to me makes more sense to group into a record and create an array of those records.

Example

type Item is record
  Description : [some sort of string];
  Size        : Integer;
  Colour      : [some sort of string];
end record;
type Item_Array is array of Item;
Items : Item_Array;

Instead of

type Description_Array is array of [some sort of string];
type Size_Array        is array of Integer;
type Colour_Array      is array of [some sort of string];
Item_Descriptions : Description_Array;
Item_Sizes        : Size_Array;
Item_Colours      : Colour_Array;

I have been using Unbounded_String up till now without difficulty (I’ve also done some experiments with Indefinite_Vectors which seem pretty useful).

Thanks everyone for the various suggestions.

You may consider UXStrings with list of strings which provides a string container:
UXSL1 : constant UXStrings.Lists.UXString_List := ["Ada", "Strings", "Wide_Wide_Maps", "Wide_Wide_Constants", "Lower_Case_Map"];

I agree that your Item record is in 99% of the cases the way to go.

But note that if your’re desperatly chasing wasted CPU cycle, the better choise is your “instead of” proposal, because of the massive effect of data locality on performances.

For an intro : Data Locality · Optimization Patterns · Game Programming Patterns
There is tons of video on the subject on YouTube (search for “Data Oriented Design”).

The time to access things is highly variable, we all know that.
Roughly speaking :

  • register : 1 or 2 cycle
  • L1 cache : 3 to 5
  • L2 cache : 10 to 20
  • L3 cache : 40 to 75
  • RAM : 60 to 100
  • disk, cloud : …

One impact that is hard to integrate for an old programmer like me is the change of dimension in cache. In a laptop CPU, there is 32 to 64 kB data cache per core, 256kB to 1 MB L2 per core, and 4 to 32 MB of L3.
Meaning that there much more L2 cache memory on my laptop that the whole memory of my first PC.

But even if memory is no more a problem in common programing, I’m still reluctant to allocate bounded string or other “max sized” items.
Let’s say I need to manipulate a text. If I bound my identifier to to a (generous) 128 x 8 bits characters max, I feel bad because using 1 kB per identifier seems a huge waste.
But a 1 000 identifiers text will fit in L2 cache, and a 4 000 to 32 000 identifiers text will fit in L3.

Another hugue improvement is in internal branch prediction mecanism in CPU.

I retained from some of those presentation that just packing data in a list ADT, implemented as a linked list, but not scattered throughout the memory, result in rougly a 50 factor improvement in text search speed because of memory locality.
But I you then remove the pointers and use array indexing, you gain another 2 factor, because of prediction mecanisms working better.

I had no occasion to test that till now, but I suspect that changing my habit of using unbounded strings in structures linked together by reference, for arrays of bounded strings, may result in a major improvement, with an acceptable memory usage in most cases.

I would be very interrested in any data putting together code snippets like all those you give in this discussion, memory footprints and execution times, so that I could choose the right balance between keeping the code understandable and performances, and make an informed initial choice based on the circumstances.

2 Likes

Understandable code is usually more performant. When you use containers (be it Unbounded_String or Bounded_String) instead of arrays, iterators/cursors instead of indices, plus some functional mess on top of it, you loose both. It is difficult both for the machine and for the reader to understand what is going on. If containers could have simple interfaces like arrays, it would be only the machine. Unfortunately they cannot.