UTF-8, Identifiers and dealing with Strings in Ada

Character and string literals in Ada are typed, and the compiler determines the type from context. Ada.Text_IO.Put_Line is defined as

procedure Put_Line (Item : in  String);

(ARM A.10.1), so the compiler expects a literal of type String. Type String is defined as

type String is array (Positive range <>) of Character;

(ARM 3.6.3), and type Character is Latin-1. So the compiler is expecting a value of type String, which this string literal is clearly not.

Note the different uses of “string”:

  • string: a general concept
  • string type: a language concept
  • type String: a predefined string type

These are similar to the uses of “integer”, “integer type”, and “type Integer”.

BUT …

If your editor saves your code as UTF-8, then the sequence of bytes in your file will be 16#22# (‘“‘), followed by the bytes of the UTF-8 encoding of your string, followed by 16#22# (‘“‘). Your compiler, if it’s not decoding the UTF-8, might interpret that as a string literal containing the Latin-1 characters corresponding to those bytes, and accept the string literal. Your program would then output those Latin-1 characters, but of course what is actually output is the representation of those characters, which is a sequence of bytes. If your output device expects UTF-8, it will decode those bytes and show the corresponding code points.

How you get a compiler to treat the UTF-8 source code as Latin-1 is compiler dependent. Your compiler appears to be decoding the UTF-8, resulting in an invalid literal for type String; the compiler used by ThyMYthOS appears not to. You will either need to tell your compiler to interpret the source as Latin-1, convert your string literal so that UTF-8 decoding leaves it as Latin-1, or call a subprogram that uses a different string type.

(Kazakov’s approach of restricting source code to ASCII results in it being interpreted the same regardless of whether or not your compiler decodes UTF-8. This enhances portability, but makes the code more difficult to understand.)

From a purely ARM point of view, which says nothing about how code is stored, the error is using a non-Latin-1 character in a string literal that will be interpreted as being of type String. Even if your compiler will accept this, it is still an error.