The library provides string handling facilities like I/O formatting, Unicode and obsolete code pages support.
This update provides full Unicode normalization support. Unicode normalization is intended to equalize same looking glyphs in order to compare them. In particular normalization applies to diacritic marks like ü = u + ◌̈, ligatures like fi = fi, symbols like Ω = Ohm symbol, subscripts, superscripts. However normalization does not apply to Latin and Cyrillic letters nor ligatures like German ß.
Changes to the previous version:
- Function Compare to compare arrays of code points was added to the package Strings_Edit.UTF8;
- Function Image to convert a code points array to UTF-8 string was added to the package Strings_Edit.UTF8;
- Function Compare to compare arrays of code points using a code points mapping was added to the package Strings_Edit.UTF8.Maps;
- The package Strings_Edit.UTF8.Normalization was added to provide Unicode decompositions (NFD and NFKD), composition, normalization (NFC, NFKC) as well as comparisons of normalized strings. The canonical composition rules are respected;
- The application Strings_Edit.UTF8.Normalization_Generator was added to support updates of the UnicodeData.txt data base;
- The test case test_utf8 was added.