Plan for next version of International Ada Standard

dmitry-kazakov · May 4, 2025, 8:36pm

And the point is that these operations are bad software design and style. I already mentioned local vs. global programming. Functional approach promotes global programming when operations involve the whole container. It is necessarily inefficient, unreadable and constraining to certain data structures and certain operations. The example I also gave is extracting an element of matrix by multiplying it by two vectors.

Good, because they are bad algorithms. Under no circumstances you should be required to rotate an array. Such ideas lead to reading a 2GB log file into memory and then rotating it. It might appear absurd but this is exactly how programmers trained in “functional” style solve problems.

Rosetta code exercises and home work assignments have nothing to do with actual software.

kevlar700 · May 4, 2025, 8:52pm

Hmm. I wasn’t really thinking that there was based on this. Though I did look for Dewars thoughts on Ada95. Now that you mention it. If, which I have heard many say that Simula 67/small talk esque message passing is the important part and way to do OOP. Then that isn’t the OOP that Ada provides for. Or is it?

dmitry-kazakov · May 4, 2025, 9:09pm

No. Ada OO was designed in a strongly typed way. In particular, you never get message ununderstood characteristic for untyped OOPL where you can try any method (send message) on any object.

OO simply put is a method of programming in terms of sets of types, which differentiates from generics by allowing run-time instances of the class in addition to instances of specific types.

There also is/was the church of object-orientation which considered the problem space constructed of objects and insisted on identifying such objects and programming their software counterparts.

These two are unrelated. You must have generic programming in terms of sets of types in some form. All OO bashers advocate for generics. Only few really ready to go back to full cut-and-paste mode. Remember the idea of class be it generic or dynamic is to reuse implementations across the whole class.

Now, leaving emotions and prejudices one could simply compare features and capabilities of both approaches: static parametric polymorphism vs. dynamic polymorphism.

Unfortunately nobody, including late Robert, ever does that.

sttaft · May 4, 2025, 9:33pm

I’ll do it, if you like …

Static parametric polymorphism is great as a way to share algorithms across similar types of things, but it is not useful in managing heterogeneous data structures, where you have various different kinds of entities in a single data structure. Variant records are one approach for handling heterogeneous data structures, but they can be a maintenance problem if the set of different kinds of entities in the data structure is open-ended, or is frequently changing.

I would say the “one liner” that captures the value of dynamic OO-style polymorphism is “case statements considered harmful” (along with variant records). I know that before OO, a compiler tended to be a morass of case statements, to deal with the heterogeneous tree-ish data structures used to represent a program at various stages during compilation.

At least with languages like Ada you can minimize the use of “when others” (or equivalent) and rely on the compiler to identify for you, when you add a new variant, where are all the case statements that need to be updated. But if you start having a number of case statements with “when others” or equivalent, things can become a real maintenance challenge.

On the other hand, OO can also create maintenance challenges, especially if you are using inheritance heavily, and overriding some but not all of the operations on inheritance. Ada reduces the problem by making static binding the default in calls between the primitive operations of a given type, but if you end up using dynamic binding a lot, heavy use of inheritance can create a troublesome amount of coupling between primitive operations. Microsoft MFC classes were infamous for having some maintenance challenges in this area, when Microsoft would release a new version of MFC but the inter-operation dependencies were not identical to the earlier version, existing MFC clients would not necessarily keep working the same way.

So I think I am agreeing with you that both sorts of polymorphism have their place.

dmitry-kazakov · May 5, 2025, 7:45am

Yes, both are needed.

What I appreciate in OO is, admittedly, a remote opportunity of simplifying and unifying the language by expressing hard-wired constructs in terms of interfaces and inheritance. It is a very long way with roadblocks of MI and MD. But in the end I want a type system capable to describe complex relationships between types like, array, index, element, range, slice. Your case statement example is in that vein.

pmnw · May 5, 2025, 8:24am

That’s highly opinionated. If I didn’t think you were acting in good faith, I’d say it’s just dismissive.

It’s difficult to understand this pushback. What’s so different about Sort or Find. and Map, Rotate, Shuffle or Stable_Partition that the first two deserve to be included in Ada and the others don’t? I’m asking rhetorically - please do not trouble yourself answering, I already know what you think.

Please note that I am not talking about these OCaml pipelines that would generate this wasteful state mutation (which even if it exists may be acceptable!). Ada doesn’t need such machinery, it has its own way of doing things. However, this style is used extensively in places such as R’s Tidyverse and increasingly in mainstream general-purpose programming languages. Of course, that’s a void argument to those who may consider that “not a real work”, “not a real programming”, bad engineering or homework assignment-tier work.

NumberSix · May 5, 2025, 10:16am

You’re right. If you’re using Linux, apply GitHub - sowebio/adel-doc: Ada Development Environment on Linux based on the latest GNAT, GNATStudio and Alire to have an Ada environment with Alire, this document explaining some of Alire’s subtleties not described elsewhere, and then use UXStrings and/or v22 (which includes UXStrings) to develop console or web apps in UTF-8. A new v25 framework is coming, now called LibreFrame, with a new graphical interface. All presented at the AEIC Ada 2025 workshop.

dmitry-kazakov · May 5, 2025, 10:43am

Sure. It is all about software decomposition. I have a very strong opinion that software shall not be decomposed into operations (“algorithms”) you listed. I explained why it is so on grounds of readability and performance.

Sort does not deserve it.

Find is OK being the basic operation of an ordered container. Such containers are designed with the main purpose of having Find.

Here I can only repeat someone’s joke regarding OCaml: “Oh, calm!”

F-Loyer · May 5, 2025, 10:56am

The program I am thinking about converts strings copy-pasted from a PDF (yes, it is a broken interface… even worse than XML) into the configuration format of a COTS (which is XML). I guess it is a legitime use of XML (no choice). I havn’t Data Structure Change like transforming a record into a new one with an added field. One of the main change is the parsing of each line and producing a record. The other, deals with the List.group which takes a list of items and produce a list of list of items. Such a function is practical in a Tidyverse style programming. But sure, I would have written things differently if I would have to transform a 10GB database!

jere · May 5, 2025, 1:26pm

I’ve been thinking about similar things to this a lot recently. I started up a small toy project to learn basic compiler concepts and have been learning about AST lately. One of the more interesting design issues here for me is trying to figure out if I want to go OO or use a collection of records for each node type along with a variant record for the “any node” type.

My first stab was with OO, but one of the things that nagged me is that for the following stages of the compiler I then had to do a big if / elif / else block to determine which descendant type I was currently working with so I knew how to interpret the data externally after the AST was built. That seemed like it wouldn’t scale well long term, so now I have been back to considering the variant record way as a case statement selection seems to scale better for that one part (but then I run into situations where you get into case statement hell for other operations that OO would have handled more nicely).

My past experience with variant record / composition over inheritance /etc style approaches in larger projects I helped on though definitely makes me antsy to go down the variant record approach. I remember going through pages and pages of case statements trying to figure out what linke to what. It wasn’t fun.

All that to say, I do see the benefits of both as well as some of the struggles.

dmitry-kazakov · May 5, 2025, 1:54pm

With OO design you would rather call a primitive operation of the node which then would dispatch. With a variant record you would write a case statement. This is the opposition Tucker is talking about.

You seem to use OO as if it were a variant record, still making a “case” statement on the type tag. This is not how OO is supposed to work as you lose most of advantages of OO here (except extensibility which in the case of AST is not that important as say in the case of Widgets or device drivers).

P.S. An example of OO AST you can find in Simple Components. [see parser-examples/parser-tree] It parses an Ada 95 expression and creates the AST of. Since it is mere an example, the only universal node method is Image to dump the tree. The point is, no case statements there.

jere · May 5, 2025, 2:08pm

Normally yes, but the challenge is stage 2 of the compiler (the parser) generates the AST. Later stages can’t retroactively add their operations as dispatchable routines as the AST is already built.

Now I could go back and make the AST more kitchen sink like and design all the later stages into the AST generated by the parser, but that feels like a design inversion (now the parser needs to know about all the later stages).

I agree. I’m trying to avoid using it like it were a variant record. The problem is that later stages need to identify 'what kind of node" it is in order to do the correct operations (unless I go the kitchen sink method). Like running an interpreter on the AST. The interpreter has to walk the tree, and for each node do something. The AST doesn’t have dispatchable interpreter methods built into it. The same would apply to a code gen stage. It needs to identify what kind of node it is in order to generate code correctly (I think)?

dmitry-kazakov · May 5, 2025, 2:29pm

Yes, of course you need to plan in advance what the next stages would need. Isn’t that obvious?
There is no universal design of such things. And, look, there are hundreds if not thousands data description formats like ASN.1, XML, JSON and so on. None of them is usable without reinterpretation. Nobody cares. Why in the case of OO even a minor design adjustment is considered a major sin?

First of all, you do not need AST to evaluate the code. In fact there is another example that evaluates expression without AST generation. [parser-examples/calculator]

But, if you want to be able to evaluate AST, just add methods to the nodes for that.

Yes, it is a part of the design, and if you do it bottom up rather than top down, you would have to return back, maybe, several times, because you ignored requirements of the following states. I do not say it is necessarily wrong. It is another topic - bottom up vs top down. In case of complex algorithms, and parsing is such a thing, you might indeed want to play a bit testing ideas before finalizing the design.

One of the major advantages of Ada OO is a relative safety of such games. If you make a change you can be almost sure that the compiler will catch potential issues. Not so with variant records and other low-level stuff.

jere · May 5, 2025, 2:47pm

I took a look at your examples, but even in your examples, you eventually have to fall back to case statements.

Less obvious when you are learning things as they are introduced to you. It’s difficult to design for the future while you are still in the present. Top down is much easier to do once you have experience over the whole program. When learning in stages, you sometimes have to go bottom up.

Yes, but I am currently learning about ASTs so I am trying to focus on those until I have a good grasp. I’m not working towards anything production relevant. This is a learning experience for me.

Lucretia · May 5, 2025, 3:06pm

That’s what dispatching on the tag is for, what you’re describing there is using separate records or using a variant.

dmitry-kazakov · May 5, 2025, 3:19pm

Right, usually one starts somewhere in the middle. When learning one must really get to the bottom.

This is why having a large set of libraries is rather an impediment to learning. Ada is frequently accused of lacking libraries, but it produces competent software engineers like no other language, IMO.

jere · May 5, 2025, 3:25pm

Yep yep, I addressed that above.

Side note, if you have any links to online tutorials where they do the kitchen sink method for ASTs, I’m definitely interested. I don’t really have the income to afford books at the moment, so I would definitely appreciate any suggested online tutorials/blogs/etc that kinda walk through different methodologies.

Lucretia · May 5, 2025, 3:49pm

Pretty much any compiler tutorial will cover it.

--  Scanner stuff.
type Root_Token is abstract tagged record 
   Line, Column : Natural;
end record;

--  For Delimiters.
type Simple_Token is new Root_Token with null record;

--  Anything with words, i.e. keywords or identifiers.
type Tokens is new Root_Token with record
   Lexeme : Unbounded_String;
end record;

type Keyword_Tokens is new Tokens with null record;

--  Parser stuff.
type Root_Node is tagged record...

type Expression is new Root_Node with...

type Simple_Expression is new Expression with... or

type Root_Expression is tagged record with...

type Simple_Expression is new Root_Expression with...

For example. Or you could have different roots for different types of hierarchy, this is what Clang does.

dmitry-kazakov · May 5, 2025, 4:14pm

You can skip lexer. So no lexemes, no tokens. AST nodes are literals, identifiers, expressions, statements. Parsing can be done in a single pass.
It is important not to use Unbounded_String or any types requiring heap and/or finalization. The AST is allocated best in an arena pool organized as a stack (mark and release). It is dropped as a whole with a single release call. Performance is important.
Instead of Column/Line, use a general location type that specifies a source range. You want to highlight a source range rather than the first character in the messages.
Using limited is a good idea, nodes are never copied.
Expressions have discriminants, e.g. the operands count.
You might need error nodes to continue compilation after error correction.

jere · May 5, 2025, 5:25pm

Yep, what we are talking about is past that part. I have that much. It’s later that becomes the problem.

Though I am thinking we are getting way too tangential at this point, so maybe we just table this for now and let the original discussion continue.