Ada Bootstrap compiler funding thanks to nlnet!

I read somewhere that C++ exception handling couldn’t be efficiently expressed in C (I suppose without runtime support) in the early days of C++. So, perhaps, exceptions would be another area of complexity for implementing an Ada to C transpiler.

(A quick search turned up this: https://stackoverflow.com/questions/15464891/is-it-possible-to-write-a-zero-cost-exception-handling-in-c. In a comment, Johnathan Leffler cites: Stroustrup ‘Design and Evolution of C++’.)

Small note: C nested function is a GNU extension not supported by clang/llvm. But the GNAT frontend has support for unnesting. It’s used with gnat-llvm for example.

1 Like

New here and tinkered with Ada long ago. My question is what C compiler and standard is the project targeting? I’m curious if this project allows bootstrapping on a system that does not support GCC or clang such as Plan 9.

Hi @HawkerFrake, welcome to the forums!

I would recommend that you take a look at the live-bootstrap system, which I explained a bit about in my last post in this thread. It starts with a basic binary seed, it then uses some basic Scheme to create a barebones C compiler and LibC. Then TCC (Tiny C Compiler) gets compiled and the Fiwix kernel is used. Then MuslC, then a ton of dependencies and they arrive to GCC 4.0. Afterwards it is a progressive buildup to modern tools and modern Linux.

Hope this explains a few things,
Fer

Thank you for the info. I misunderstood the original post and did not realize this is part of an entire bootstrapping procedure for Linux.

My interest was (is?) this C bootstrap could be used outside of GCC/clang to bootstrap Ada on platforms other than Linux/Unix. Looking at the live bootstrap procedure, tinycc and mes are used which could be supplanted by kencc and Streetlisp (a work in progress minimal scheme for plan 9). Seems possible given the target compiler goals.

1 Like

No reason why it couldn’t be.

I am nostalgic of Forth too. Just remind some Forth compiler was made for some little capacity systems if not tiny, then it was quite logic to avoid a complex binary search since the number of words was limited (only 50 in the Jupiter Ace, 130 words are required by Forth83… but FigForth has 300 words and F83 more than 1000). Lower level than the machine code ? I don’t think so. FORTH was my first language which supports structured programming (IF ELSE THEN, DO LOOP…).
However, this language is quite fragile (I mean, there are no consistancy check). I won’t choose it to implement something as complex as a compiler.

1 Like

That’s why you implement it in Ada, and use a SPARK-verified SeedForth-generating backend to emit the “code” — you get all the [verified/provable] consistency from SPARK and let the Forth be the ultra-lowlevel that it is.

I see… you program an Ada Compiler in Ada, but this Ada compiler targets seedForth as a machine language. Then you have a working Ada compiler on every system that supports seedForth.

Afterward, switching the seedForth target with the actual machine language… and compile the compiler and make it faster.

1 Like

Bingo!

You can “close the loop” on it, too, by writing a SeedForth interpreter in Ada, and emitting that to run the SeedForth-as-machine-language programs you emit from the Ada compiler — then the bootstrapping of everything devolves to implementing the “vocabulary” of SeedForth on that particular architecture. Perhaps:

Package SeedForth.Core_Words is
  Subtype Byte is Interfaces.Unsigned_8;
  Type Byte_Data is Array(Byte range <>) of Byte;

  -- Assume Token_ID is an enumeration of the words.
 Type Machine_Code_Chunk is null record
    with Size => 8*256;

Words : Constant Array (Token_ID) of Aliased Machine_Code_Chunk
  with Import; -- Fill each entry with the 

Private

  Type Word_Data(Length : Byte) is record
    Data : Byte_Data(1..Length);
  end record;

ACTUAL_WORD_1 : Word_Data:=
   ( Length => 8, Data => (1,2,3,4,4,3,2,1) ) -- the actual byte-stream of the instructions,  for the word.
 with Address => Words(TOKEN_ID'FIRST)'Address;
-- and so on.
End SeedForth.Core_Words

There’s probably a better way to do it, likely using subprograms/procedure-pointers, such that the operation of the word is an assembly insertion, as with:

-- Assuming a 'VM' type which contains the SeedForth stacks.
Type Word_Ref is not null access Procedure(Object : in out VM);

  Procedure Add(Object : in out VM);
  Procedure Sub(Object : in out VM);
  --… 
  Procedure Stop(Object : in out VM);

Words : Constant Array (Token_ID) of Aliased Word_Ref:= (
  Add  => Add'Access,
  Sub  => Sub'Access,
--…  
  Stop => Stop'Access
);

-- The implementation of ADD, SUB, etc as assembly/imports in the BODY.

And THAT is how you reduce bootstrapping to the absolute bare-minimum effort.

As I am struggling with my Ada 83 compiler, here are my thoughts about Forth backend or alternatives and compiler bootstrap.

So you want a verifiable compiler that does not make things behind your back. In any case you will have an intermediate representation (DIANA in my case), if you choose 9X 2X versions, it will be more complex, good luck.

DIANA structure can be dumped and scrutinized because I have the Ada 83 front-end source in Ada 83. So I know what it produces, and I have tools to look at produced DIANA). Yes the front-end and the fasmg code generator are compiled with gnat (in fact I can half compile the compiler to DIANA with my own front-end, it works, but does not execute of course the code generator being incomplete). But even if I had a “noxious gnat” that attempted to add code in my back, the DIANA structure would not be affected in a non verifiable way.

Now that you have your intermediate representation semantically checked, how to produce an executable from this in a transparent and quick way (not necessarily optimized but acceptable) ?

The most direct way I found is to write custom macro assembly text which is assembled to native machine code with program data structuring explicited. This text can be read and visually verified.
From DIANA, I choose to write fasmg macro text (flat assembler is a general assembly machine, not a specialized machine assembler), because some macros define a stack machine operations, other macros define data structuring on stack and co-stack.
Each Ada unit is compiled to a .finc fasmg text, the program takes an include .finc defining operations macros (that emit binary native code after an ELF header, and other withed units or sub-units have their fasm .finc included also. This fasmg text is assembled by the fasmg assembler and produces direct an ELF executable for whatever machine you like (depends on the binary emitted by the macros, that is it depends on the first .finc include (“codi_x86_64.finc” in my case on Linux Ubuntu).

The choice of Forth will no be a convenient one, Forth lacks data structuring expression which is paramount in coding a stack intensive language as Ada 83 (and I suppose later versions).

To explicit the point, here is an example with source file “lis_caractere.adb” (in bin directory of my framagit project) :

with TEXT_IO;
use  TEXT_IO;
			-------------
procedure		LIS_CARACTERE
is			-------------
  C	: CHARACTER;
begin
  PUT( " Bonjour " );
  NEW_LINE(2);
LIRE_UN:
  loop
    PUT( " Entrez un caractere ! " );
    GET( C );
    PUT( " Vous avez tape : " );
    PUT( C );
    NEW_LINE;
    exit when C = 'q';
  end loop LIRE_UN;
end	LIS_CARACTERE;
	-------------

This is compiled (translated to DIANA and code_gen treated option W) with the command line

a83.sh ./ ./lis_caractere.adb W

(framagit bin directory serves for tests and contains the ADA__LIB).
This gives the following macro text (in “LIS_CARACTERE.FINC” in ADA__LIB directory) :

include 'TEXT_IO.FINC'
PRO	LIS_CARACTERE_L1		;---------- PRO LIS_CARACTERE
ELB 1					;    BODY ELAB
VAR C_disp, b				; variable bool char
					;    end elab
begin:					;---------- BDY INSTRUCTIONS
STR STR_L2, ' Bonjour '			; constante string=' Bonjour '
	LCA	STR_L2_ptr
	CALL	TEXT_IO. ,PUT_L56
	LI	 2
	CALL	TEXT_IO. ,NEW_LINE_L26
LIRE_UN:
STR STR_L4, ' Entrez un caractere : '		; constante string=' Entrez un caractere : '
	LCA	STR_L4_ptr
	CALL	TEXT_IO. ,PUT_L56
	LVA  1,	C_disp
	CALL	TEXT_IO. ,GET_L50
STR STR_L5, ' Vous avez tape : '		; constante string=' Vous avez tape : '
	LCA	STR_L5_ptr
	CALL	TEXT_IO. ,PUT_L56
	Lb  1,	C_disp
	CALL	TEXT_IO. ,PUT_L52
	LI	 1
	CALL	TEXT_IO. ,NEW_LINE_L26
	Lb  1,	C_disp
	LI	 113
	CEQ
	BT	L3
	BRA	LIRE_UN
L3:					; post loop LIRE_UN
	UNLINK 1
	RTD
excep:
endPRO					;---------- end PRO LIS_CARACTERE

There is an include for TEXT_IO which is too long for here (text_io I am struggling with for some time…).

The main procedure is launched by a standard launcher in file “LIS_CARACTERE.fas” which includes stack machine macros via “codi_x86_64.finc”

include '../../src/code_gen/fasmg/codi_x86_64.finc'

	LINK	0, loc_siz
include 'LIS_CARACTERE.FINC'
	CALL	, LIS_CARACTERE_L1
	SYSEXIT

 virtual VARzone
   loc_siz = $						; Ce n'est que là que l'on calcule la taille des globales qui sera retropropagee au LINK 0
 end virtual
</pre/

Now, with the command

fasm lis_caractere.fas

An 1.8Kb ELF executable “lis_caractere” is directly produced ! And it works (but for a minor problem in the damned text_io which puts an extraneous line feed :roll_eyes: but I know why).

All stack machine instructions are defined as macros in codi_x86_64.finc.
for example LI 1 (load immediate 1) is

;			-------------
macro			LI?	val		; Load Immediate
;			-------------
 db 0x48, 0xC7, 0xC0				; mov rax, val
 dd val
 PUSH_RAX
end macro

fasmg produces x86-64 binary in a single RWX segment (…) ELF executable.

Sorry for the length, but it gives a relatively detailed account of my (hard thought won) solution of the poor lonesome Ada 83 man.

Now, you will notice that I use a stack machine which is not far from Forth machines like J1, P16, Steamer S21 and company. There is no secret in there. But Forth has no explicit mean of Ada stack variables structuring.
You will suffer with that…

3 Likes

Hello everyone! I’m Efraim and I’m the one who applied and received funding from NLnet for the Ada bootstrap.

A bit about me: I’ve been a member of Guix (from-source Linux distribution) for about 10 years now and one of the maintainers for about 2 years. There I’ve been responsible for several of the architecture ports (aarch64, riscv64, powerpc) and have continued to keep them, and others, running with patches and tweaks over the years. I enjoy deep dives into topics while figuring out hard problems and then creating Guix packages to keep everything working going forward. The first time programming really clicked for me was while I was learning C (back around 2012, so C99), and between that and Guile (used by Guix) is often where I end up when trying to work through ideas.

Every 4-6 months it seems we get someone who comes by and asks why we don’t have GNAT available as part of our GCC package. After one of these times we ended up with Ada/Ed packaged in Guix, with some workarounds for 64-bit architectures. At the time I did some software archaeology in Debian’s archives and I was able to find some compiled packages for gnat from before the Ada frontend was merged into GCC but I wasn’t able to figure out how they had built it. I had hoped that with more time we would come across something that we could use to bootstrap GNAT but everything seemed to require an existing Ada compiler.

The plan:
The plan isn’t finalized yet, and NLnet has said that if things change they’re happy to work with me about changing the milestones. My initial plan was more from whole-cloth, using flex and bison, to try to build a new front-end for GCC that could then build enough of a working compiler in GCC that it could then be used to build GNAT for real. As I started to break down the parts that I might need I remembered that Ada/Ed was around as an Ada 83 interpreter, and I already had a copy of it in C. This led me to my now current plan; update Ada/Ed from the late 80’s so it works on more and 64-bit architectures, add support for Ada 95 (and probably 2005), write a wrapper for it similar to gdmd for gnatmake/gprmake/others as needed, and use that to build the first GCC with Ada support.

My initial target is a bunch of Linux architectures (x86_64, i686, armv7, aarch64, riscv64, powerpc64le), that is, not just x86_64, and I see no reason to not get it working on other OSes also.

I saw @HawkerFrake mention plan9 and I’d be interested in learning more about what might be required to have it run there. I don’t want to make changes that hurt portability if it’s possible.

To answer a couple of questions:
Why C?
As @Irvise pointed out, the bootstrap is only really meant to be a stepping stone to the next compiler. I haven’t been involved in the live-bootstrap project but I have been in Guix’s bootstrap process. I view it as basically 3 stages: Get to a working C compiler, progress through the decades to something more modern, build the final packages which are used to build the rest of the system. That’s the simplified version, there can be multiple rounds of building and rebuilding packages when going from one stage to the next. The C compiler specifically because so much of the software written for use for building other packages is written in C.

Why not a transpiler?
The goal here is to bridge the gap from the final C compiler to a full featured Ada compiler (GNAT specifically in this case), not necessarily to have something that is regularly used. As far as using transpiled code, it becomes problematic since it’s not the “preferred medium for making changes”, but if it’s transpiled once and then used as a base for future editing that’s generally acceptable.

What license?
From the README in Ada/Ed it looks like it is already licensed GPL2+.

6 Likes

Hey! Welcome to the community! Nice to see you here ^^

Nice! I have been using some of Guix for a while :slight_smile:

Afaik, they did not. One always required a previous version, the original one (the bootstrapping key) was never released :confused:

This looks feasible. There are already YACC (and Bison?) files available for Ada 95. Check the ayacc and aflex projects and their files.

Also sounds good! Though is it worth it to take such an old piece of code and try to update it instead of creating something with more modern tooling that is already known to work in different arches, etc? I am not opposed to the idea though!

<3

From my side, I can recommend some of the information that we had gathered throughout the years about this topic, see this link. Also, I know some of the GNAT’s internals quite well (I helped update GNAT for NetBSD and managed to create a GNAT version that worked on NetBSD/powerpc), if you need any help, I think I can lend you a useful hand.

Also, feel free to ask questions here or post a repository with your work so that people may chime in!

Best regards,
Fer

2 Likes

Hi Efraim !

I am afraid I do not see clearly what is the objective. I understand you want an Ada compiler ; an Ada 83, 95, 2005, 2012, 2022, 2032 compiler ?
Apparently you want to start with a C language compiler. And then ? Recompile Ada/Ed ? it is in C ; it was translated from Setl. It targets an interpreter executing aix files if I remember. The intermediate representation is specific, not DIANA. Well. Now let’s suppose you revive Ada/Ed (it is feasible, I recompiled the interpreter some years ago in detached pieces, aahh C building… :roll_eyes:), You have an Ada 83 compiler to a virtual machine, what do you plan to do with it ? Write an Ada 2X in interpreted Ada 83 ?

I do no understand the relation with GCC. When I had a close look at Gnat 3.15, I saw it used the GCC backend (gigi interface), I suppose it is always true. You would make another Ada front-end written in C for the GCC IR and code generator ? It does not reduce to Flex an Bison use, that is the easy syntax part, the chef’s piece is the semantic analysis and its runtime implementation.
Or you want to compile an old Gnat ? jGnat was targeting java vm and I think it is independent of GCC. But you need an Ada compiler, and perhaps an intermediate 83/9X from those years, Ada/Ed will probably not do the job, and I am not even sure that a modern Gnat will chew its far ancestor without groaning.

So…

I want to create a way to build GNAT without needing an Ada compiler in my PATH. The specific version initially isn’t the most important piece since I can compile one version and then compile later versions with that one. Later I’ll need to target at least gcc-7 since that’s the first version of gcc with riscv64 support, although I suppose with loongarch I’ll need to do even later versions.

The plan is to take Ada/Ed and improve upon it until it can be used instead of an Ada compiler when building GCC with Ada support.

Ok, got it. The problem will be :

  1. finding the earliest version of gnat, supposing it is written not too far from Ada 83. Adapt it to a modern OS/machine probably linux x86-64 or ARM (unless building on available historical machines).

  2. augment Ada/Ed in C (if feasible) for ingesting early gnat source and produce code linkable with gcc C pieces (Ada/Ed targets an interpreter…) ; or regress the early available gnat x95 compiler source to Ada 83 (which amounts to get back to an earlier than available gnat version because I would not be surprised that the very beginning of gnat is with an Ada 83 production compiler).

For the first point, The first gnat version I found is 1.67 for Sparc or OS2, nearly 400 files building with gcc 2.5.7 !
I would try to compile Ada gnat pieces with a modern gnat and -gnat83 option to detect the non Ada 83 features (features what will be necessary to add, if possible, to Ada/Ed).

For the second point, I would not bet that Ada/Ed can be augmented to the required fraction of an x95 Ada compiler to process and code an early available gnat source.

I feel this enterprise will require a considerable endeavor.

From historical perspective, it would be interesting to know with what compiler and on what machine the very early gnat has been written and compiled. Not sure it is Ada/Ed. Industry production compilers such as DEC Ada were much more powerful.

Now your project gives me an amusing idea : What portion of a very early gnat Ada source can be processed by the DIANA translator I work on. Just syntactically at first to appreciate the gap between Ada 83 source and the early gnat Ada source text. But not sure I’ll have the time, I try to have some code generated from DIANA and only Ada 83 interests me, x95 and others are too big for me and I am too old to run after Ada 2032

2 Likes

As far as I remember the first GNAT compiler was bootstrapped by an Alsys compiler. There was an announcement in comp.lang.ada around 1991 or 1992

2 Likes

Indeed, that is what happened.

By the time GNAT was being created, the Alsys compiler already existed, and we first wrote in Ada 83, and bootstrapped using the Alsys compiler on a Sun, then once we were bootstrapped, we step by step extended, so that now we use quite a lot of Ada 95 features.
Robert Dewar on comp.lang.ada (2002)

1 Like

Indeed, GNAT was initially implemented in Ada 83 using the Alsys compiler, and covered only a small subset of Ada 83. Once enough of Ada 83 was implemented, the team managed to use early GNAT to compile itself. They announced that first successful bootstrap at the Ada-Europe conference in June 1993 in Paris: I clearly remember Robert Dewar making the announcement there, “and there was much rejoicing”. :wink:

You can read about this, and much more on the early history of GNAT, in the paper “Origins and history of GNAT” by Ed Schonberg. It was published originally in Ada-Europe News, Issue 20, March 1995, and was reprinted in the 30th anniversary issue of the Ada User Journal, i.e. Volume 31, Number 1, March 2010. The republished paper is available in the digital AUJ archive at https://www.ada-europe.org/archive/auj/auj-31-1.pdf#page=39

Relevant quote:
“Another early technical (and political) decision was to use Ada itself to write GNAT. Even though the rest of GCC is written in C (close to 500K lines for the various front-ends, the common back-end, utilities, and machine description files) it was not acceptable to use C itself for the Ada component. We started by writing a compiler for a small subset of Ada83, and used a commercial compiler to compile it. In June 1993, the subset was sufficiently complete to compile itself, and the compiler was bootstrapped in time for the Ada-Europe conference. The use of Ada9X features in the compiler itself has grown substantially, as the implementation has become more complete, and GNAT currently can only be compiled with itself. The compiler uses child libraries very heavily, and tagged types more sparingly. Very little code makes uses of dynamic dispatching.”

Note that “currently” in the text above is when Ed wrote that paper, i.e. early 1995, and by that time (30+ years ago!) the implementation already used many of the Ada 95 features…

2 Likes

This is logical.

Now just try to compile today some source text from the old gnat 1.67 (whose system.ads begins with “pragma Ada_9X”, a good start :face_with_raised_eyebrow:), the gnat 13.3.0 spits errors at compilation. What would be the cost of repairing all this ?

That means that recompiling those old gnats is not job done : an Ada 83 compiler like Ada/Ed will not compile those early 9X boostrap iterations, and a modern gnat will have been so bootstrapped and enhanced toward 2X that it will not compile them either.

It is a common problem of moving versions. I recall that once I succeeded in having a Pentium cross Kalinda OS with pragma No_Run_Time, some gnat 3.15 tricks and inline assembly, some versions later it did not work anymore (and pragma No_Run_Time became obsolescent).

It is as if gnat did not cease bootstrapping from its birth to nowadays. As long as you follow the continuous eternal bootstrap by adapting, everything’s all right ; a kind of principle of relativity : when you move with the train it doesn’t seem to move. But stop ten years and come back…

Thus planning to follow again the gnat bootstrap from early 1990ies versions to nowadays gnat/gcc-7…