ANN: Strings edit v3.10

dmitry-kazakov · December 4, 2025, 3:28pm

The library provides string handling facilities like I/O formatting, Unicode and obsolete code pages support.

This update provides full Unicode normalization support. Unicode normalization is intended to equalize same looking glyphs in order to compare them. In particular normalization applies to diacritic marks like ü = u + ◌̈, ligatures like ﬁ = fi, symbols like Ω = Ohm symbol, subscripts, superscripts. However normalization does not apply to Latin and Cyrillic letters nor ligatures like German ß.

Changes to the previous version:

Function Compare to compare arrays of code points was added to the package Strings_Edit.UTF8;
Function Image to convert a code points array to UTF-8 string was added to the package Strings_Edit.UTF8;
Function Compare to compare arrays of code points using a code points mapping was added to the package Strings_Edit.UTF8.Maps;
The package Strings_Edit.UTF8.Normalization was added to provide Unicode decompositions (NFD and NFKD), composition, normalization (NFC, NFKC) as well as comparisons of normalized strings. The canonical composition rules are respected;
The application Strings_Edit.UTF8.Normalization_Generator was added to support updates of the UnicodeData.txt data base;
The test case test_utf8 was added.

AJ-Ianozi · December 6, 2025, 5:02am

Thanks @dmitry-kazakov your work is awesome as always and much appreciated!

reinert · December 8, 2025, 9:24pm

Could it be a useful idea to make type-setting functions with LaTeX style/formatted strings as input?

dmitry-kazakov · December 9, 2025, 9:54am

IMO parsing LaTeX is very simple. You can always determine where current token ends and there is no precedence rules in play. The first issue arise when you must match the longest alternative from a table another is infix expressions.

Of course, you need some lookup context with tables containing names of visible macros and reparse macro expansions.

Maybe you meant something specific?

reinert · December 10, 2025, 6:11am

I find LaTeX typesetting syntax useful for organising text output under OpenGL (GLOBE_3D). I have only tried some simple attempts. It would be great to use LaTeX syntax for terminal output. One reason is that i have LaTeX under my finger tips:-)

dmitry-kazakov · December 10, 2025, 8:42am

All LaTeX syntax is \xyz{}, repeat. There is basically nothing to parse. Or do you mean the reverse action, rendering LaTeX?

reinert · December 14, 2025, 5:02pm

For example, I would like to emphasis some words with special fonts or colours inside a text. Math typeset output would of course be nice

dmitry-kazakov · December 14, 2025, 5:16pm

Yes, of course. I use it in a similar case for HTML output.