I know, at least in theory, how to handle UTF-8. However there is one aspect you are not taking into account: Programmer efficiency.
Out of the box Ada only has a very few packages with UTF_8 support:
Ada.Strings.UTF_Encoding
Ada.Strings.UTF_Encoding.Conversions
Ada.Strings.UTF_Encoding.Strings
Ada.Strings.UTF_Encoding.Wide_Strings
Ada.Strings.UTF_Encoding.Wide_Wide_Strings
Which support only Encode
and Decode
functionality. There is none of the following:
Ada.Float_UTF_Text_IO
Ada.Integer_UTF_Text_IO
Ada.Strings.UTF_Bounded
Ada.Strings.UTF_Bounded.UTF_Equal_Case_Insensitive
Ada.Strings.UTF_Bounded.UTF_Hash
Ada.Strings.UTF_Bounded.UTF_Hash_Case_Insensitive
Ada.Strings.UTF_Equal_Case_Insensitive
Ada.Strings.UTF_Fixed
Ada.Strings.UTF_Fixed.UTF_Equal_Case_Insensitive
Ada.Strings.UTF_Fixed.UTF_Hash
Ada.Strings.UTF_Fixed.UTF_Hash_Case_Insensitive
Ada.Strings.UTF_Hash
Ada.Strings.UTF_Hash_Case_Insensitive
Ada.Strings.UTF_Maps
Ada.Strings.UTF_Maps.UTF_Constants
Ada.Strings.UTF_Unbounded
Ada.Strings.UTF_Unbounded.UTF_Equal_Case_Insensitive
Ada.Strings.UTF_Unbounded.UTF_Hash
Ada.Strings.UTF_Unbounded.UTF_Hash_Case_Insensitive
Ada.UTF_Characters
Ada.UTF_Characters.Handling
Ada.UTF_Command_Line
Ada.UTF_Directories
Ada.UTF_Directories.Hierarchical_File_Names
Ada.UTF_Directories.Information
Ada.UTF_Environment_Variables
Ada.UTF_Text_IO
Ada.UTF_Text_IO.Complex_IO
Ada.UTF_Text_IO.Editing
Ada.UTF_Text_IO.Text_Streams
Ada.UTF_Text_IO.UTF_Bounded_IO
Ada.UTF_Text_IO.UTF_Unbounded_IO
As such Ada, out of the box and with reasonable effort, supports only the following development paradigm for UTF-8:
function Do_Something (Value : String) return String is
use Ada.Strings.UTF_Encoding.Wide_Wide_Strings;
use Ada.Strings.UTF_Encoding;
Temp: Wide_Wide_String := Decode (Value, UTF_8) & ' ';
begin
-- Do something with Temp using the many Wide_Wide_String
-- packages. Example:
Temp := @ (@'First .. @'First / 2) & 'Ă„' & @ (@'First / 2 + 1 .. @'Last);
return Encode (Temp, UTF_8);
end Do_Something;
And I see nothing wrong with it. I don’t even think it is significantly slower then working directly on an UTF-8 String. It is, however, significantly easier to implement.
Even that one example line would need two loops to implement. One to find out how many characters are in the string and one to find the middle character and only then you can make the insertion. If there was any more to to it would become a mayor endeavour.
Do note that AdaCL is an object orientated library meant for desktop computer (macOS, Windows, Linux). It makes liberal use of tagged types, heap memory using smart pointer, unbounded strings, collections in all of its components.
AdaCL is not suitable or meant for embedded or real time programming where 10kb of temporary memory or 1000 CPU cycles would make a difference.
However, I will try to upgrade Björn Persson’s EAStrings, which properly support both UTF-8 and UTF-16, to Ada 2022. There seem to be a need for it.