Out of curiosity, have you run into much information on type/overload resolution in the semantic analysis stage? In particular, one thing I have not wrapped my head around yet is how to handle overload / type resolution working bottom up.
So like if there is a statement like:
a.b(23).c.d := e.f;
usually the AST for the left hand side is built with something like:
selected component:
component: d
prefix:
|__selected component:
|__component: c
|__prefix:
|__function:
|__parameters: 23
|__prefix:
|__selected component:
|__component: b
|__prefix:
|__identifier: a
so normally the AST visitor would start with the node that specifies “d”, but for type / overload resolution, I would guess I would first walk the tree to get to the “a” node and then work backwards, but given functions can be overloaded, this seems pretty difficult. (I haven’t tried to do it yet, just trying to reason out how it works atm).
In my mind the AST would be: selected component: left: a right: selected component left: indexed array array: b index: constant(23) right: selected_component left: c right: d
So then I would think you would look at a and see if ‘.’ was overloaded and reinterpret the right-hand side in light of that, otherwise, descend down the right and see if b overloads it, etc.
I agree, but that’s a bit tougher to build I think and it would violate the definition of what Ada defines as a selected component (for those the right side must be an identifier, character literal, or operator name, rather than a more complex type like a selected component).
But it is definitely an approach to really look into. The other thing I considered was rather than keep using a tree structure for name related nodes, instead store those parts in a vector so the structure is flat for name related nodes and I can look at either side when I want to pretty easily.
This is the AST generated by the Ada expression parser:
. at 396:8..8
|__*() at 396:4..7
| |__. at 396:2..2
| | |__a at 396:1..1
| | |__b at 396:3..3
| |__23 at 396:5..6
|__c at 396:9..9
|__d at 396:11..11
The operation dot is combined to multiple arguments to ease semantic analysis.
For overloading resolution e.g. dot can be in a qualified name or indicate a record member or in X.all construct. In Ada you cannot just start at the bottom. In C you can, it is a bottom-up language. In Ada you need to consider all possible interpretations of all operations and identifiers. You can start at the top and follow all possible paths. When you have discarded all paths you need to consider giving a reasonable error message. It is complicated.
Looking at the dot and function call as an operator is very interesting, though I would have expected something more like:
. at 396:8..8
|__*() at 396:4..7
| |__. at 396:2..2
| | |__a at 396:1..1
| | |__b at 396:3..3
| |__23 at 396:5..6
|__. at 396:10..10
|__c at 396:9..9
|__d at 396:11..11
In your earlier output, the dot from position 10 is missing?
It does still have the disadvantage of needing to go down the tree to get to the first item (a) in the chain for semantic type analysis.
EDIT: Translating my earlier one to your output style would look like:
. at 396:10..10
|__. at 396:8..8
| |__*() at 396:4..7
| | |__. at 396:2..2
| | | |__a at 396:1..1
| | | |__b at 396:3..3
| | |__23 at 396:5..6
| |__c at 396:9..9
|__d at 396:11..11
The parser combines operations if allowed. You can forbid this behaviour for any given operation. You can also combine + and - defining a group inverse, e.g. a-b+c -> + (a, -b, c). Ada permits reordering, so you could fold constants (after sematic analysis). E.g. 1-x+2 -> 3-x. Because Ada is not bottom-up you cannot fold constants in general case.
Indexing *() has lower priority than dot. So a.b is combined first.
Indexing brackets, attribute, aggregate brackets, dot are syntactically operations as any other. Semantically they are not.