Combine .ads and .adb to one .ada file

mhatzl · January 16, 2023, 8:52pm

Hello everyone,

some days ago I proposed the feature to combine .ads and .adb files to one .ada file
in the ada-spark-rfc repository.

I have been thinking about this for quite some time now.
When comparing Ada to other modern languages, splitting definition and implementation into two files, and not being able to have “use before defined” feels outdated to me. For me, it makes code harder to read, change, and maintain.

Happy to hear your takes on this

zertovitch · January 16, 2023, 10:04pm

It is a good proposal… but for GNAT, not for Ada.
The Ada standard does not require separate files. Perhaps it doesn’t even require files at all.
Some compilers let you arrange your files as you like. You can even stuff all you sources into a single file (say, src.ada):

package X is procedure Boo; end;

with Ada.Text_IO;
package body X is procedure Boo is begin Ada.Text_IO.Put ("Boo!"); end; end;

with X;
procedure Y is begin X.Boo; end;

…and some compilers (not GNAT) will do the job without complaining:

ObjectAda Version 10.4: adabuild
Copyright (c) 1997-2022 PTC. All rights reserved.
Compilation of src.ada succeeded.
Link of y completed successfully
Tool execution has completed.

There are (at least) two drawbacks with that freedom: one is that you need to register explicitly your source files into the project; the other one is a potential messy file naming.

captain-haddock17 · January 16, 2023, 11:24pm

Having 2 separate (spec / body) Units is a great “feature” of the Ada language since the inception.
It permits to tools/compilers to separate them in different files, giving some advantages you only perceive on big projects / teams:

each team member will disclose his APIs in charge of, as a contract,
tools / compilers will verify more quickly the coherence/correctness on a (say) at the top level of the whole project, at spec level : you distribute the work to be donne
tools / compilers will only need to rebuild others’ parts if API / contract changes, most of the time your implementation code will be verified only against the contractual parts / API.
you could sell your closed sources with just publishing the API parts. (Which are the real files) to interface with.
tools / compiler could maybe pack multiple spec Units in one file, but his is just some trade-off.

Java, Swift did not adopt a spec part, just saying compiler is smart enough to find out and extract the declarative items in each included external file.

I know for sure that Swift large projects are suffering from long compile times. They have recently added some kind of caching of those specs items, with drawbacks and side effects. ( it was till recently a nightmare).

So I thank every day M. Ichbiah for his insight and experience on large projects, having a rather fast compiler for a so rich (you could say complex) language.

PS: I agree with you, there is a little price to pay by copy/paste function signatures etc…
There are also tools (gnatstub) which can do that for you first time.
PS: use gnatchop to explode 1 Ada file with multiple Units in it.

mhatzl · January 17, 2023, 9:07pm

Thanks for the clarification.

Would the standard also allow the following:

package X is
  -- some procedure to demonstrate use before declared
  procedure Do_Something (I: Integer) is
    Tmp: Integer;
  begin
    -- use Increment_By before it is declared
    Tmp := Increment_By(I, 5);

    -- do even more...
  end;

  -- declare and implement
  function Increment_By (I: Integer; Incr: Integer := 1) return Integer is
  begin
    return I + Incr;
  end;
end;

So it is not just copying the content of .ads and .adb files into one, but “fusing” their content.

Note: In the above example I would also prefer to get rid of the is keyword if directly followed by begin and write uses instead of is if variables, procedures, … are set before begin.
But this is another topic.

mhatzl · January 17, 2023, 9:34pm

Thank you for the feedback.
The title was probably a bit misleading.
I do not want to forbid/replace .ads and .adb files, but add .ada file support.

You have definitely more experience in large Ada programs than me, but for smaller ones I think a single .ada file helps to keep the project cleaner and easier to read since you know that everything is at one place.

About your points:

Since the declaration stays the same I do not see how that changes the API
Compilertime will be slower that is correct, but I assume only noticeable for larger projects
I have only basic knowledge in compiler construction, but do you not need to rebuild what got changed to not get inconsistencies?
You could still do this by wrapping your API and call your secret code (this does not add more overhead compared to having .ads and .adb files)
I guess this would make it harder to find the implementation if the spec packs multiple .adb files

I have also heard from Rust suffering the same. For large projects, compiletime still seems to be an issue.

Out of curiosity, what are typical compiletimes for large Ada projects?

Thanks again

zertovitch · January 17, 2023, 10:57pm

From experience with professional projects comprising thousands of packages and totalling millions of lines of code for each project:

a) First build, or full rebuild: a few minutes
b) After a few days, when you pull and launch a build (various people did changes at various places in the sources): a few seconds.

For all changes that occurred only in bodies, only those bodies are recompiled. If you make a change in a “merged” package (your proposal), let’s call it X, either the compiler is able to notice that the changes did not impact the underlying specification (could be tricky), or it will trigger a cascade of needless recompilations if X is referenced in other specifictions.

A key to build performance (be it for the “a” or “b” cases) is to put the most possible “with” clauses on the body side if they are not needed in the spec. It makes also the spec more readable.

mhatzl · January 17, 2023, 11:14pm

Thanks again

Ok that is actually way faster than what I expected.
But then the executiontime of tests does have a much bigger impact on the overall build time, right?

Do you think it would be better to have .ada files be transpiled to .ads and .adb much like Typescript and Javascript?

zertovitch · January 17, 2023, 11:19pm

So the compiler would know only after the 2nd is (in your example) that is not a spec but a “merged” package.
It is a bit late. Imagine that the Do_Something body is preceded by 200 lines of constants and type definitions… and a procedure Do_Something (I: Integer); added because of a pair of recursive calls.
Perhaps we could add something to the syntax, like package some body X is ( ← this specific syntax is not a serious proposal ).

mhatzl · January 17, 2023, 11:28pm

Ah you are right.
Because .ads and .adb is only convention.

Maybe package combined X is. But also not 100% happy with it.

simonjwright · January 18, 2023, 10:38am

GNAT already supports this: see gnatchop.

Also, gnatname: given a collection of source files with non-GNAT file naming conventions and multiple units per file, generate a project file (or a set of pragmas - never tried this) which allows gprbuild to find the appropriate code.

mhatzl · January 18, 2023, 4:32pm

If I understood the documentation for gnatchop, it splits on compilation units.

So something like the example package X above would either

fail, because there is no body for package X
not be “chopped”, because the package is one compilation unit => use before declare would not work
“chop” procedure and function => would still not allow use before declare on nested procedures, functions, …

am I missing something?

simonjwright · January 18, 2023, 5:32pm

gnatchop works on valid Ada, so yes, it would certainly fail to meet your expectations in at least one of the ways you suggest

mhatzl · January 18, 2023, 10:45pm

Ok thanks for the clarification

ethindp · January 22, 2023, 3:18am

I honestly like the separation between spec/body. Not all compilers require it, but it has lots of advantages, like:

The specification is independent of the body. Which means you can ship the spec as your API without ever shipping the body. You can’t really do that when you combine both.
A separate spec makes it significantly easier to understand the API of a program. No need to glance at internal entities; just glance at the spec and go.

These, I think, are the biggest advantages, besides faster compilation times and the other advantages listed earlier. I find it ridiculous to call it “old-fashioned”. Sure, Python/Rust/Golang don’t do that, but so what? I’ve found that combining the two makes the entire program harder to read. And it also forces an IDE on you, because if you don’t use an IDE, you have to wade through a ton of internal nonsense just to figure out all the different entities you have access to. Of course, it also slows compilation times by a lot, as Rust has aptly demonstrated. (For reference, Rust has been trying to speed up compilation since its inception, and they still don’t have a solution that results in compilation times that are as fast as Ada’s, or even C’s. Won’t compare to C++, since C++ is a completely different beast, but even there I’m pretty sure C++ beats Rust, though I haven’t benchmarked it. And that’s really, really sad, if you think about it.)
Edits and notes:

Granted, Ada requires the specification before the body, so not all of the disadvantages that (say) Rust has are present. But most of them are, especially if you do something weird and go “Hey lets throw a bunch of packages in one file”.
Fixed spelling and typo errors.

mgrojo · January 22, 2023, 5:39pm

There’s another benefit of this separation. When you look at a commit or changes in general, you can partly predict how much is the compilation going to last. There’s a big difference when there are only changed bodies to when there is at least one changed specification included.

Tom · February 18, 2023, 7:27pm

The reason for having two files is that the .ads file can be shipped with an object file .o so you don’t give away the source code in the .adb file.

simonjwright · February 19, 2023, 7:03am

The motivation for packages in Ada is discussed in the Ada 83 Rationale.

Having spec and body in separate files was a result of the original GNAT team deciding to base their compiler on GCC and avoiding the need for an Ada program library, which is what other compilers use (with the results that (a) you have to introduce your code into the program library before it can be compiled, (b) this is complicated because you have to handle adding, replacing and deleting units, and (c) you can end up with a corrupt library, e.g. DEC Ada back in the day).

I’m pretty sure that if the spec is generic or includes inlined subprograms you’ll have to provide the bodies to your customers anyway.

cantanima · February 21, 2023, 5:35am

Perhaps I misunderstand, but this doesn’t square with my experience using Rust for more than a year (and other systems programming languages for many more).

The specification is independent of the body. Which means you can ship the spec as your API without ever shipping the body. You can’t really do that when you combine both. … combining the two makes the entire program harder to read.

A specification can be auto-generated from the body. Quite a few languages do this. Rust has rustdoc and cargo doc which have pretty clear standards and produce lovely results.

And it also forces an IDE on you, because if you don’t use an IDE, you have to wade through a ton of internal nonsense just to figure out all the different entities you have access to.

I don’t know any Rust devs who do that. They’re more likely to find and read the docs. Rust documentation, like Rust’s compiler error messages, tend to be helpful, both at compile-time and at run-time (at least with the standard library).

For reference, Rust has been trying to speed up compilation since its inception, and they still don’t have a solution that results in compilation times that are as fast as Ada’s, or even C’s.

Since Rust, like pretty much every systems programming language I’ve ever used, pre-compiles to symbol files, which you can think of as a spec for the compiler, and which can be passed around without having access to the implementation. I highly doubt that separating spec and implementation has a signficiant effect on compile-time.

Think about Java, for instance. All you need are documentation and .class files; with that, you work with other people’s systems without ever seeing the source code.

If I’m not mistaken, it is true that cargo downloads and compiles Rust code from scratch, rather than downloading and working with a symbol file, the way some Oberon and Java can, but I think that has more to do with Rust developers, being a bit like Gentoo developers: custom-compile everything to the hardware for best performance. Once the source code is on the machine, though, Rust builds a symbol file only once, unless you modify the source.

From what I’ve seen, and as I recall from Rust community discussions, compile times are slowed by:

macros: optional, but extremely powerful;
the borrow checker, which more than makes up for any time penalties it imposes; and
defaulting to statically-linked binaries.

Contemporary Rust has incremental compilation, and compiling individual crates is quite fast. The real holdup in pretty much all my experience is the last stage, the linker.

Won’t compare to C++, since C++ is a completely different beast, but even there I’m pretty sure C++ beats Rust, though I haven’t benchmarked it.

I haven’t compared myself, but C++, like C, had no concept of modules before C++20, and the vast majority of C++ software is pre-C++20, so C++ devs are stuck with the compiler’s copy-and-paste’ing of include files. However, the team I work with collaborates with a team that works in C++, and the C++ team has much, much longer build times: on average, ours are less than 10 minutes (from scratch); theirs are roughly two hours every time they change something, no matter how small. It is not clear to me why that is the case, since we have our own, internal, Rust implementation of something like their system, and its build time is included in that ten minutes, but it really is an apples-to-oranges comparison.

The one really good argument I remember seeing for separate spec files is that it forces a software engineering team to think about what they’re doing before they do it, which is especially valuable when they won’t be able to change it after sharing it. That’s a compelling argument.

captain-haddock17 · March 16, 2023, 9:00am

…
forces a software engineering team to think about what they’re doing before they do it
…

Aaaah you got it!

Ada is all about good engineering

pyj · March 18, 2023, 1:29am

C++ build time anecdote

However, the team I work with collaborates with a team that works in C++, and the C++ team has much, much longer build times: on average, ours are less than 10 minutes (from scratch); theirs are roughly two hours every time they change something , no matter how small.

As someone who works on huge C++ projects, this is a red flag to me. Most projects aren’t like this, usually builds are much, much quicker, the worst I’ve dealt with was ~30 minutes due to working in core headers on core types that most of the entire project used. Seconds are much more common, and sometimes minutes.

C/C++ builds are extremely parallizable, but sensitive to SSD/HDD speeds, so I’d be curious what hardware they’re throwing at it. A great way to slow down a big C++ build is to use HDD.

Physical Design

A specification can be auto-generated from the body

In some ways yes, in other ways, no.

Due to the nature of the interface/implementation split, you often must provide certain information for the compiler to make decisions, such as providing a size for a type used on the stack. Rust would called this “Sized”, whereas in Ada you’d put the type definition in the private section of the module, in C++, this is part of why private: is a section in a class declaration in a header file even though clients don’t logically need it.

There’s also reasons to not do this to prevent leaking of design information you have in your libraries, such as types and functions private to that library. You also often don’t want to export all your header files to prevent confusing users as well and preventing others from depending on them.

In Rust you can control this with pub and pub(crate), in C/C++ this is done with anonymous namespaces and static functions/variables. You can’t do this generically for C/C++ though because anonymous namespaces are relatively new (a decade or so, IIRC). For Ada, you can’t because it it follows the linear sequence of elaborations model, and doesn’t have conditional compilation to remove elements, so I think it’d be harder to generate the correct interface if you have multiple implementations (e.g. Windows/Mac/Linux of the same module).

I highly doubt that separating spec and implementation has a signficiant effect on compile-time.

C++ and Ada both exhibit what’s called “physical design”, in addition to “logical design.” The gist is that the file that elements are put in affects the program structure as a whole. Lakos’ book, “Large Scale C++ Software Design” is old, but goes into exceptional detail on this, and how to “levelize” designs to prevent compilation problems from spiraling out of control. (Some of the stuff in that book is out of date, but it’s still a great resource). In C++ this manifests due to each translation unit being compiled down to an individual object file, since it affects optimization and other concerns. (I’m omitting details here on “unity builds” which were much more popular before link time optimization (LTO))

Most programming languages don’t deal with issues arising from physical design issues. The origin of the header/source file split with multiple object files was due to entire programs not being able to be kept in memory. Nowadays it helps with build parallelization. Reading a file off a SSD really isn’t that bad these days … Lakos’ book recommends redundant header guards around every include, since at the time opening the header to read through it again just to skip over everything with the header guards caused a bunch of slowdown. The usual modern technique is to just use #pragma once instead.

I guess you might be able to technically reverse engineer the C++ name mangling and data in object files to get this since I think calling convention/parameter/return types could be inferred? Integer sizes especially could change per platform which might get ugly. I’m not sure how you’d handle aliased names in this case, since I don’t think they’d show up.

If you’re struggling with C++ compilation times, there are tools and techniques to throw at it, due to its header/source split.

minimize includes in header files
prefer forward declarations over headers
minimize inlining (note that writing function definitions in a class declaration are implicitly inlined)
minimize usage of templates (a common source of main in the 90s, and coming back due to often heavy template usage)
"pointer-to-implementation (PIMPL pattern) to reduce re-compilation times for heavily changed elements on which a lot of things depend.

An advantage of Ada and C++ has is that you can often prevent rebuilds of large numbers of components if specs/header files don’t change. You can often limit changes to source files, so just the affected object files get rebuilt and linked.

From a tooling side, you should definitely put include-what-you-use into a CI job to give you a running log of what is actually being used. I’ve gotten pretty decent speed boost just from it’s recommendations. Every project should have it set up.

On Linux ccache used to be a must-have, I’m not sure if there’s a more modern replacement.

Bazel uses caching and do an excellent job of parallelizing builds.

FASTBuild lives up to its name. It has caching and distributed builds built-in. Even without distributed builds, a project I converted over to FASTbuild from CMake resulted in a 4x faster clean build. It’s amazing to me that more people don’t use this tool, because frankly, it’s amazing.

Bazel and FASTbuild both have excellent profiling capabilities.

Rust compilation speed is a known issue, and the idea of a complete rewrite of the compiler has been floated.. Rust also heavily uses monomorphism (C+±like compile-time templates) which leads to bloat. It is super convenient though to just have a .rs file with everything in it (include unit tests!) and not have to worry about moving templates to a header file, etc.

The one really good argument I remember seeing for separate spec files is that it forces a software engineering team to think about what they’re doing before they do it, which is especially valuable when they won’t be able to change it after sharing it. That’s a compelling argument.

If the exported headers of a library don’t change, then you know that a component linking in your library won’t have to recompile, though it may relink. This is useful for when you’re deploying new artifacts to other people, or pulling in new versions of artifacts and you don’t have to worry about a full compilation. You just leave “stitch everything back up” to the linker.

Summary

This is a complicated topic touching many elements of the lifecycle of shared libraries, static libraries, and final executables.