UTF-8, Identifiers and dealing with Strings in Ada

Interesting. If a I add pragma Wide_Character_Encoding ( UTF8 );, then I get the same error. Note that in my example there was no pragma present. Also if -gnatW8 is present or when I add a BOM, then I get the error message. There is also -gnatiw, but I does not seem to make a difference.

See also Character Set Control - GNAT User's Guide and UTF-8 encoding in GNAT | ada-lang.io, an Ada community site and the recent discussion about Unicode strings

So if gnat is not aware that the input is UTF-8 and therefore also not that there is something like code points, it will treat UTF-8 strings “literally”, so using as many bytes as the source file has for the string. When you tell gnat to “detect” UTF-8 encoded code points, it will reject to store UTF-8 code points into String. Instead if you use:

pragma Wide_Character_Encoding ( UTF8 );
with Ada.Wide_Text_IO; use Ada.Wide_Text_IO;
procedure Main is
begin
  Put_Line("こんにちは");
end Main;

It compiles and outputs UTF-8 encoded strings (not UTF-16, as you might expect)!

./main | hexdump
0000000 81e3 e393 9382 81e3 e3ab a181 81e3 0aaf
``