Rather than diverging on the thread on gnat bootstrapping nlnet project when answering OneWingedShark message, I prefer opening an appropriate new thread.
First some answers to the following post.
Absence of an explicit data structuring in Forth, not being a huge insurmountable problem, will be a nuisance in the following sense. Memory access in Forth is @ and ! . Simple but overly primitive when generating code from DIANA. Ada is very stack oriented so that the primitive memory access is disp(FP(level)), that is indirect displaced addressing on a Frame Pointer associated to static nesting level. Frame Pointers are best kept in a display rather than in a static chain because static depth is frequently important in Ada (the DIANA translator with nested subprogram and declare blocks has more than 8 levels and it is safer to set a limit of 16 to 32 levels). This is why I changed my display coding from a “clever” using of r8 to r15 x86-64 registers for FP display to a 32 memory display at base of stack (see the end of codi_x86_64.finc in src/codegen/fasmg of framagit ada-83-compiler-tools project).
When producing intermediate low level representation, you need to verifiy where variables or variable descriptors are stacked and aligned. This will be obscure in Forth. For example, an Ada 83 STRING type is a character array. Arrays need to be represented with a descriptor containing array element size, first, last indices (triplet for each dimension) and pointer to array content. The descriptor can be allocated on execution stack because its size is known at compilation, the array content must be allocated elsewhere because its size can be known at runtime only in some circumstances. The descriptor must be ordered POINTER, SIZE, FIRST, LAST on a 64 bit addressed machine if SIZE and others are 32 bits integers, to respect alignment. The array content must be placed on another sort of stack managed heap.
This is technical details, but effective implementation is full of those tiny details. With Forth it will be difficult to manage.
For those points:
There’s several ways you can do it:
Yes I once thought using TCC or something like it, but I have a profound aversion for C and all ressembling improvised languages. So no to let in C components, if unavoidable, I even prefer assembly code to C.
- Target a VM
I prefer avoiding dragging in a whole gaz plant.
- P-Code is simple, but less well-known now;
Polish Gliwice Ada IIPS theses planned to use A-code, an extension to P-code for Ada 83 (M.Ciernak). It is the same perspective as Ada/Ed interpreted bytecode, a stack machine code is used which cannot be optimized easily but generation from DIANA is easier than n-uples. But polish implementers planned to translate A-Code to native x86 binary (Wierzinska thesis).
- JVM is ubiquitous, making for a good cross-compile bootstrap platform;
This is a whole gaz plant. I can’t.
- DOTNET is interesting, but more limited (there are implementations on non-MS OSes);
Same remark.
- Forth/SeedForth, very simple and small;
I thought also about using Forth, but after reflexion and small scale trials I abandoned the idea for reasons evoked above. The low level code is unreadable with regards to data placement verifications.
- LOLITA - A Low Level Intermediate Language for Ada+
(This paper is the only ref to it, though, TTBOMK)
I did not know, thank you for the article, you are an efficient provider of quite interesting informations. Some remarks are interesting, they do not use n-uples but trees, they point the disp(FP) basic addressing, there is an allusion to macro low-level IR instructions. But there are too few details to progress a lot.
- Interpret/execute the IR directly.
(This is technically what Graal/Truffle does, though the IR is essentially the AST.)
This is somewhat akin to the solution I adopted : the low-level IR is macro-code assembled to native code. I finally settled on this solution because it is the most direct way to getting a working native executable. It has drawbacks though : 1) the executable is monolithic and withed units or ancestors are include macro-text rather than library ELF units. 2) The macros are stack machine operations (like P-code or A-code instructions), no optimization is done ; for array accesses in loops impossible to get loop invariants out ; common subexpressions idem. But given the power of present machines it is not an immediate catastrophe. I also think it is half a problem, instead of being native code generators, stack machine macros can almost certainly be generators of an optimizable ad-hoc low-level IR itself generating final native code. But for now it overwhelms my capacities. So I am content with the present solution.