Towards a HAL for multiple runtimes

We currently have Ada_Drivers_Library which targets ZFP runtimes or at least the drivers included often don’t use interrupts or any higher level tasking features. Also the interfaces are typically synchronous: The procedures block until the transfer is complete. This is nice to start with, but since we see more and more tasking runtimes for microcontrollers the question is: Can this be improved?

Forcing the same API on driver implementations with different capabilities (interrupts, tasking, exceptions) is not the right way forward.
I imagine different layers with increasing abstraction and convenience for the user:

  1. HW register level: This typically does not allow for any abstraction. It can often be generated from SVD files and only contains types and constants, no functions or procedures.
  2. HW access: Functions and procedures to interact with the HW on the lowest level. A bit of abstraction is possible, e.g. for an UART there should be at least something like “Is_Transmit_Ready”, “Transmit_Frame”, “Is_Receive_Ready” and “Get_Frame” that are non-blocking. But a lot of things would be HW specific.
  3. Async buffered: No tasking, but interrupt and/or DMA controlled transfer of buffers, so that the main task can do other things while the transfer is running.
  4. Sync tasking: When you have multi-tasking, it’s often simpler to suspend a task while I/O is active instead of using interleaved asynchronous calls.

How would you model this in Ada, so that each layer could re-use the implementation of the lower layers? Especially the fact that higher layers may only be compiled if the runtime supports this. What could be a good naming scheme for packages?

I believe that @SweetAda has been thinking about this for a while, so maybe he has opinions that he would like to share.

Best regards,
Fer

That’s kinda the point of my idea of a full IDE for VHDL+Ada+PL/SQL, albeit less ‘HAL’ and more ‘entire project’, but there’s a huge overlap in the abstracting process for both. — To illustrate from “the other side”, consider a hypothetical Ada implementation where the IR is executable, this means that you can ‘compile’ and run Ada programs without ever doing the backend, and thus you can cut out a ton of work for a minimum viable product: this could be key in making the system highly portable: imagine if that VM/IR-executor were written in Forth, then the problem of porting it could be as simple as implementing Forth on the new architecture and you have the added bonus that the IR is [typically] high-level. architecture-details become a non-issue in bootstrapping now, and to fully port Ada you only have to (a) write the IR-to-native backend, (b) compile-recompile-run in a deterministic recipe to materialize a fully native -running Ada-in-Ada emitting native machine-code, and (c) hook into the OS, if applicable [ie file-system, DB, whatever].

Now, the specifics here are directly opposite: for a HAL you need to be concerned foremost with (1) the hardware, and (2) the abstracting itself. — But I still think you could use this methodology: cheat. If you don’t have to do work, avoid it until you have to do it. As I showed, you could have an Ada implementation that essentially doesn’t have a backend, saving that for when you actually need it [think efficiency-requirements or proof/certification like airlines].

There’s a few ways that I think you could do this:

  1. Generic systems/subsystems and layers.
    1. Use the generic-formal parameters to abstract/parametrize the layer/subsystem,
    2. this allows for static-polymorphism, and
    3. forces the body to program against the parameters.
  2. Some sort of OOP, think like a “VCL for hardware”,
    1. This is more familiar to most programmers,
    2. it can be done as a wide, robust, well-formed library,
    3. it’s also probably the most difficult to use cleanly.
  3. A sort of multi-implementation where the project system selects the appropriate implementation for a compilation
    1. Pro: This would use separate and the implementation’s project-system,
    2. Con: this would use the implementation’s project-system.

I’m trying to cover (3) and (4) with my A0B project. Primary API are asynchronous, with activate use of interrupts/DMA/callbacks. And there is A0B.Await package that helps to convert callbacks into synchronous calls when tasking is available (or enter processor into sleep state when there is no tasking).

For example, here is abstract asynchronous API for generic I2C bus and drivers of typical I2C devices

Its implementation for STM32F401

Driver for SCD40 sensor, which have exotic view on I2C protocol

I would make Read/Write primitive, inherited from abstract I/O device interface, e.g.

   type Abstract_Device is limited interface;
   procedure Read (Device : in out Abstract_Device; ...) is abstract;

Thanks for sharing your insights. @damaki put (1) into the runtime itself (in package Interfaces), which I think is a good idea. It does not add any overhead and at least the more featureful runtimes will need to access timers and interrupt controllers anyway. I’m not sure if layer (2) should also be part of the runtime, as the runtime would then depend on the HAL interface. On the other hand layer (2) could be pure convention to include a package with pre-defined names, same as for the Ada standard library.

Regarding repository/crate organization, @godunko your approach is at the opposite side of the spectrum compared to Ada_Drivers_Library. You have a single repo/crate for each driver while Ada_Drivers_Library is a mono-repo. Maintaining a consistent set of abstractions might be easier in a single repo, while I also prefer to have actual implementations for devices in a different place.

@OneWingedShark @dmitry-kazakov Ada_Drivers_Library already uses an object oriented approach, even though runtime dispatching is not required most of the time. A single device typically only has controllers of one kind that can be handled by the same code. Only the address of the peripheral and a bit of runtime state is required per instance. I heard that compilers might be able to see that only a single implementation for an abstract interface is available and remove dispatching when not needed. The overhead is negligible for layers (3) and (4), but not for layer (2) where you might have a function call to poll for availability in a tight loop.

Dispatching always removed when not needed = when the controlling type(s) is statically known. It is the way Ada’s OO works. It has zero run-time overhead comparing to other methods (generics etc).

Whether the dispatching table has only one element, the compiler cannot know. Note that even linker cannot know that. This is because you can load a module (for example a dynamically linked library) at run-time and add another instance in there. That would dynamically expand the dispatching table. It is a quite useful feature, if loadable drivers support is planned.

Note that my runtimes define registers only for the peripherals it needs for itself; it intentionally doesn’t define registers for all peripherals (in hindsight, perhaps it would have been better to put them under System instead of Interfaces).

For (1) there is svd2ada which can generate register definitions in Ada based on SVD files. This is what I use for my runtimes. We could go the Rust route and start distributing Peripheral Access Crates (PAC) which expose the generated register bindings. See Rust’s rp2040-pac as an example.

For (2) we can continue to develop HALs, like the lovely rp2040_hal. We also have the hal crate which provides interfaces to derive from for operations common to every device.

To be honest I don’t know what to say, because interrupts and tasking are things yet to be fully exploited in SweetAda.

So far, in SweetAda there is little interrupt management. It is true that nearly every platform has at least an interrupt timer, but it is used only to bump the tick count, to have a notion of time. Anyway a casual user could implement a localized management if he wants.

For example, a little better use is in the NE2000 driver (still quick and dirty), tied to the PC-x86 platform. Here I used a very primitive form of dispatching with a parameter that reflect the address of the object being interrupted. I think it is the same techniques used in GUI contexts.

I’ve yet to think how to implement a more structured form of interrupt management. Probably with overriding interfaces and class types. What it worries me is performance, because under ISR you cannot waste time due to language restrictions. Generics are worst than evil, especially when talking to peripherals. Unfortunately Ada is excellent when defining register layouts, but not when defining whole peripherals (I’ve seen code hitting the kbyte barrier and still computing a register address offset – which is one of the reason to implement in SweetAda every form of unit overriding and optimization).

Last but not least, I see a big challenge in make interrupt and asynchronous handling suitable for dynamic processing and driver agnostic code, confined or not in the runtime.

Another idea is to have a two-stage runtime, where the 1st stage manage the low level and exposes an API to a 2nd level runtime which you could use. But these are ideas that are extremely difficult to implement, at least for me (not in the technical sense, but in the time available).

Tasking will only make thing worst. From time to time I elaborate on some Ada decisions (formalized in the annexes) that, IMHO, should be wholly reconsidered.

We’ll see.

Why do you say this?
Do you have examples?

I like the idea of splitting this into layers, though the higher level ones get kinda fuzzy. My thinking on these things is still evolving, you’ll find a lot of my existing code contradicts what I’m about to write.

Layer 1 (register types, address space)

The Peripheral/Register/Field hierarchy generated by svd2ada is generally fine, but not always. GPIO registers are very repetitive and are better represented as arrays than records with one field per pin. You can index an array, with the record fields you’re stuck writing long case statements that will almost certainly generate suboptimal code. SVD can express arrays if it’s a linear arrangement, but some chips (STM32 AFRL/AFRH) have interleaved registers that make it a headache. Many vendors don’t add these dimensioned groups to the SVD, so you need to patch the SVD.

Memory access width is important and Volatile_Full_Access isn’t good enough. Depending on bus topology, some peripherals require that all memory accesses are 32-bits wide, shorter accesses generate odd behavior (eg. writing 0x42 becomes 0x42424242). The compiler can’t know which memory ranges require word-size accesses, so you have to tag every register type with Volatile_Full_Access, Object_Size => 32. This limits the use of arrays where you have repeating 4-bit wide fields, packed into a couple dozen 32-bit wide words. Logically, it’s an array with Component_Size => 4, but you can’t actually do that because then you get short writes, so you have to have nested or multidimensional arrays split on 32-bit boundaries and do arithmetic to figure out the offsets.

svd2ada doesn’t add SPARK accessibility aspects (Effective_Writes, Async_Readers, etc) to the register types. It just marks everything as Volatile, which is vague. CMSIS 5.0 added some field-level accessibility and information to SVD (eg. some fields are read-only, others are write-to-clear), which we have no way to express in Ada.

My preference now is to write the types by hand and forget SVD even exists. It takes a bit more work to transcribe the datasheet, but you can reason about the best way to layout the records and arrays as you’re writing it. Sometimes the best thing is to just define Unsigned_32 at a specific memory address and move on- you get better codegen this way.

Layer 2

Most vendor HALs throw pointers all over the place or use C++ classes here. I don’t really see the benefit. For example, no two devices will handle configuring an I2C peripheral the same way, so your abstract I2C interface will just ignore configuration entirely, except maybe some notion of bus speed and address size. Users of the HAL need to port their code to your specific implementation either way, so the abstraction is leaky by design. The compiler never optimizes away the pointers completely and in Ada you pay for it at elaboration time.

I’m starting to introduce a split between this “Low Level” layer and the higher level HAL-compatible implementation in the RP.GPIO driver on the 3.x branch of rp2040_hal. I’m not sure yet if these two levels of abstraction belong in the same package, subpackages, or separate crates entirely, but the split is clear. The low level functions should contain very little logic, never blocking. On Cortex-M0 this usually means it compiles down to 4-8 instructions that can be inlined.

Layer 3/4

I think timer interrupts are the thing to focus on. If you get that right, all of the other interrupts are easier to reason about.

Timer interrupt handling belongs in the runtime. You cannot use delay statements and tasks cannot be scheduled properly if timer interrupts are handled outside the runtime. You can paper over it by doing something like package Timer renames Ada.Real_Time; then having another implementation for light runtimes that provides a similar spec. But really, just put it in the runtime.

For higher level drivers, I like to define a package with procedure Interrupt; that the library user is responsible for calling at the right time. The user should decide whether to use the Attach_Handler aspect/pragma or export a symbol that gets linked into the vector table. Exporting a symbol is simpler and lower latency, but less portable and ignores all of Ada’s tasking and rendezvous logic. There are good use cases for both, so this is not something a driver author should decide.

I really like generics for drivers. The resulting code is easier to read and reason about without repeating This. in front of everything. I like to think of the generic formals as defining virtual machine instructions with the narrowest interface that accomplishes what I need. For example, this LCD driver has two procedures for setting pin states and a slightly higher level procedure to clock out 8 bits to a SPI peripheral. Everything platform specific gets encoded in the implementations of those procedures. The driver is simple to port to a new device, even if there’s no existing HAL… Just fill in the blanks with writes to the SPI and GPIO peripheral registers and you’re done.

I think generics (good in other context), do code bloating in this case.

Examples existed in ancient versions of SweetAda. I was forced to not inline the low-level nested subprograms that actually do the read/write, in order to keep code fairly small. But in this case, you have the overhead of calling a lot of subprograms.

As an example, take the Z8530 SCC chip. You have to deal with

  1. a base address

  2. the kind of connection for pins A//B and D//C

  3. the stride of the addresses, that is, the offset between register D and register A

  4. accessing a register with an index register

and, both 1) 2) and 3) combined with an address multiplier factor. Some of these parameters taken from a record descriptor, some locally assigned.

I assure you that when you do register programming, you will find tons of instructions in the executable. And you end up recognizing that every devices requires a different set of generics. So your plan to make the whole thing agnostic, fails.

The situation collapses when you plan to drive the chip by means of both normal memmapped instructions and x86 port I/O instructions (I know, now obsolete).

But perhaps it was my fault in writing Ada code. Yes it’s likely.

It’s way too easy to deal with memmapped peripherals in, e.g., modern MCU ARM Cortex devices. You write a bitfield record, overlay it at its address and handle it almost like a memory variable.

I’ve not had as much trouble with generics causing bloat in my own embedded projects, but I also came from a world where I learned how to properly design template classes for embedded use in C++. I tend to use a non generic backend for shared functions that each of the generics utilizes and that minimizes any code bloat I have usually. That lets me separate out the things I want to inline (usually in the generic packages) with the things I don’t want to inline (backend package stuff).

If you can do that, you do not need generics.

Generics are used when you need to substitute a type or a closure (local subprogram body). In other cases plain subprograms would do the job.

Note that a compiler (not GNAT) may use shared generic bodies. Ada 83 was designed to allow this.

This is more an implementation issue: GNAT does repetition rather than shared generics.
Several implementations do use shared generics, Janus does, and DEC did.

This doesn’t make sense to me, consider Bluetooth: there are a set of commands/signals which all Bluetooth devices must implement. This is atop some radio, which could be one of any number of implementations, yet even so really ought to have a generally standard interface. Whether this interface is realized via separate, or generic, or a combination is largely irrelevant, but this begins the whole idea of the abstraction. (And, incidentally, can showcase one of Generic’s lesser lauded features: static polymorphism.)

For a small, nigh useless example respecting the lowlevels of hardware, consider this:

Generic
   Type Register is (<>);
   With Function Add ( Object : in out Register; Value : in Word ) is <>;
   With Function Zero( Object : in out Register ) is <>;
Package Accumulation is
   Generic
      Object : in out Register;
   Package Accumulator is
      Procedure Reset;
      Procedure Add( Value : Word );
   End Accumulator;
End Accumulation;

Package Body Accumulation is
   Package Body Accumulator is
      Procedure Reset is
      Begin
         Zero( Object );
      End Reset;

      Procedure Add( Value : Word ) is
      Begin
         Add(Object, Value);
      End Add;
   End Accumulator;
End Accumulation;

And here we have implemented a tiny generic abstraction of a minimal accumulator, all that the client has to do is ensure a Register-type, and Add, and a Zero subprogram. — This illustrates that it is possible to factor/parameterize a whole subsystem into a generic structure… and isn’t that the whole point of a HAL?

Interesting.
Could you give a tiny example?

Commonly, yes.
But you’re leaving out that you can use them in composing a type, view, or other subprogram.

You should use formal packages or else child packages for that (having a generic interface):

generic
   type Register is (<>);
   type Machine_Word is mod <>;
   with function Add (Object : in out Register; Value : in Machine_Word) is <>;
   with function Zero (Object : in out Register) is <>;
package Generic_Register;

generic
   with package Register is new Generic_Register (<>);
package Generic_Accumulator is
   procedure Reset;
   procedure Add (Value : Machine_Word);
end Generic_Accumulator;

or

generic
package Generic_Register.Generic_Accumulator is ...

No, it is not impossible. But trying is worth the lesson. :grinning:

P.S. Generic interfaces are not fully supported in Ada. E.g. specialization, when you instantiate only a part of the interface. I would not propose going that road because introducing generics was a mistake. The efforts should be better spent on removing them from Ada.

Unfortunately SweetAda uses GNAT.

This doesn’make sense to me neither. Bluetooth is a standard protocol shared by various devices. I have problems in writing read/write access routines to different register with different record layout types, located at different addresses, and with different bus modes.

Like I said to another user, if you are willing (but I absolutely don’t want you waste time on that), take a look at the z8530 unit in SweetAda and see if you can implement generics with lower code impact.

Did you implement something like that on real hardware ? You have a generic, then you have to call external functions. You have generics for every register type, and in some device you have tons of that. Every register has its own layout, and is located where it happens to be. “Zero” and “Add” functions don’t mean anything to me, I have to program different bits in different locations.

Not always the “standard way” matches everywhere, and things are not always straight.

Look, I really appreciate your effort, and I will take into account what you wrote. I implemented in the past things more or less like your example. Simply I was not satisfied with the compiler outcome. Perhaps it was my fault or an improper compiler setup.

That doesn’t mean that I won’t change my mind, and in the future I could rework my code with better understanding of generics, plus your advices.

PS

I have generics in my code. But the operations carried out differs only for few things, like the register type, the value and maybe another parameter. And they don’t call anything else out of the body.