How to easily use AI Skills with your Ada projects

Hello,

Quick comment:

If you are using LLMs in your Ada projects, have a look at GitHub - AdaCore/skills · GitHub.

In some LLMs (Claude and Cursor for sure, I could not figure out how to do it in Codex) you can give a command like:

‘/plugin marketplace add GitHub - AdaCore/skills · GitHub

And you will have instant access to Alire, GNATprove and GNATtest skills to help you with your projects.

I have been blown away with how strong these LLMs are nowadays in lifting Ada to SPARK Silver. The key is to keep the scope contained and to provide good test cases to avoid the LLM to either hallucinate, or to ‘cheat’.

Enjoy,

Mairk

Over the months, I built an agent system with a /commands workflow.
I have a main command, /dev, that acts as an orchestrator; it runs:
1. Requirement writing: functional specification
2. Technical specification
3. .ads writing
4. Tests writing: with a scope that only allows this command / sub-agent to read .ads files from the project, no .adb (even if it’s not written yet)
5. If it’s a lib, it writes examples
6. Implementation: writes .adb files, loops until tests pass, or stops if the tests seem bad. There is a gate here too, that does not allow the command / sub-agent to modify the code of tests / examples
7. Cleanup and documentation writing
I run this setup inside Claude Code (with Opus as orchestrator and tech-spec writing, Sonnet for the other parts) and in OpenCode, where I tried different models in addition to Anthropic’s: GPT and Gemini Pro are good; I’m currently trying Kimi K2.6 as orchestrator / tech-spec, and GLM 5.1 for the other parts.
With this setup I have no real troubles. There is a human gate at the end of steps 1 and 2, so that it has a good base to implement things in autonomous. But always read what it produces at the end, even if the code produced mostly does what your requirement said, it can be inefficient, messy, etc.
In addition to that, I use a custom skill automatically applied on .ads/.adb to force the LLM to use a defined coding standard, plus a note of self-learned lessons (e.g., not to declare a package in the declarative part of a protected object :upside_down_face:).

An example of project that maily use that workflow: termicap

PS: I use the skills of AdaCore for few weeks now, I do not see really helpful things for now, since I already something similar for alire in my agents knowledge. I have not tried their SPARK skills but it seems very interesting

Uhhh… I think your project is private

Edit: nope! The link needs a missing “p” at the end. GitHub - adarium-labs/termicap: Cross-platform terminal capability detection for Ada/SPARK — TTY, color, size, Unicode, terminal identity · GitHub

Yep, I just fixed the link

Hi @markhermeling,

you may want to try and get the skill included in https://www.skills.sh/ which seems to be a fairly well-known and used Skill aggregator.

Best,
Fer

As far as I remember, it is automatically added to skills.sh when you npx skills add <owner/repo>

Thanks for sharing!

I agree that the skills may not add a great deal of tool-knowledge if your agents have already ‘learned’ this themselves. It really benefits people that have not explored this.

For example, with Alire, my experience is that if you ask an LLM to install Alire without a skill, it will muddle through. It will try a few things, get stuck, fuddle for a while and then figure it out. The skill documents the places where it could get stuck and gives it a easier start time.

the GNATprove skill I have found very beneficial in lifting Ada to SPARK. Lots of good nuggets have been encoded in it, including your comment of an orchestrator and leaf agents to lift individual subprograms.

Thanks for the reference, this tool is really interesting.

I especially note the “find-skill” skill, which explains to the LLM how to look up the relevant skills on this site when the user asks it to do something!

Hi Mark and all,

I wanted to bring this topic during the SPARK office hours, but we ran out of time (for a good reason!). I have been working with AI and Ada/SPARK for about a month now. I have mostly used open-source models (so no GPT, Claude nor Gemini). The ones I have used most were GLM 5.1, DeepSeek v4, Qwen 3.6 (3.7 a bit too, but that is not open!), KiMi, MiMo and more recently Nemotron 3 Ultra. I would like to share my experience with these models (all used through https://opencode.ai/)

  1. They are all terrible at idiomatic Ada. The most modern ones and most capable ones are substantially better. I would assume that leading models (GPT, Gemini, Claude) are even better, but I have not tried.
    1. They are pretty good at writing basic Ada 95/2005, but they always fail with modern Ada features. Some examples are: using Pragma instead of with aspects. Indicating that something is Pure in a comments, but not documenting it. They generally do not use Ada 2012 expression functions function XXX () is ();. They have some issues with how private works and the package hierarchy…
    2. They do not use modern Ada 2022 features even though they would produce much cleaner code. They also do not use aspects almost at all (Ada 2012 iterators, type annotations…).
    3. They are terrible at properly utilising the Ada Standard library. They are not that good at understanding that Strings can be created dynamically regardless of the size (within reason).
    4. They have to be explicitly told that not null constant access is the preferred way to use access. Though when told to use a similar model to rusts memory model, it tend to improve quite a lot.
    5. They suffer a lot of pitfalls that new users also make. For examples, using subtype everywhere instead of proper type yyy is new xxx; generating a lot of in out in functions, using when others => too often or defaulting to an if ... elsif ... else construct when it should not (or not using ranges in whens). They could easily create subtypes to help themselves with set theory but they do not…
  2. They tend to go into long changes/refactors and then when compilation fails they go on a long round-about way of fixing things. This consumes quite a lot of time and tokens, driving up the costs a lot.
  3. Only a few models can do proper SPARK proofs. GLM, Qwen and Nemotron can. DeepSeek v4 can, but only basic analysis and when guided. Other more basic models fail quite a bit in this area.
  4. They generally misuse Ada tools. They generally do not understand Alire, GPRbuild, etc. GPRbuild is specially annoying as AIs tend to duplicate A LOT of code/statements there and structure it very poorly.
    1. They are terrible with GNAT and GNATprove switches.
    2. They barely use GDB when asserts/exceptions happen. They go on long reasoning chains which again consume a lot of time and money. When told to use GDB properly, then tend to find the issue in no time…

After some time working with them, I found the following to be quite useful:

  1. Creating skills. Just like what @markhermeling has been doing, creating skills is a good way to help the AIs.
    1. As I do not have time to properly write skills/summaries, what I did is tell the AIs to read the GNAT, SPARK, GPRbuild, Alire and Ada 2022 documents from the internet and summarize them for their own consumption. This was done with several AIs, all doing the same task one after the other and further refining the resulting documents.
    2. This greatly helped with code quality, SPARK contracts and adding some modern features. However, I was expecting a much more substantial improvement… For example, the use of the Ada standard library did not improve, nor did code structuring…
    3. This kind of work only yields improvements if the AI can handle a lot of tokens/memory, this means that only AIs with contexts of >500k tokens can work (project specific documentation plus Ada’s). This is becoming more and more common thanks to 1M contexts being the new standard.
  2. Forcing styles/structure every time I saw something strange.
    1. For example, forcing the no when others unlike specified, not null constant access, etc…
    2. Explicitly state Ada 2022 use.
    3. Using several AIs to write and review code (AI orchestration).
  3. Telling AIs to also review projects such as SPARKlib to learn the different techniques, code styles, structures, etc; and document those too.
  4. Allowing them to have as much compiler/GNATprove feedback as possible and let them be autonomous
    1. This tends to create long running tasks and contexts, which is problematic in terms of time and token/money consumption.
    2. In some instances the AIs devolve into madness and get stuck quite severely. In this cases human intervention greatly helps.

In conclusion, I think things are good and rapidly improving. However, I think having a proper, curated set of modern Ada documents for AIs to consume is what truely helps with their lack of Ada/SPARK knowledge.

Would it be okay to create some skills for “best Ada 2022 practices” and “SPARK enablement” too?

Best regards,
Fer

P.S: I really liked the office hours, I was also super happy to see random new people join and willing to learn more and discover Ada/SPARK. It was great to have heavy hitters from AdaCore and the wider community sharing their vast knowledge! 10/10

Thanks for the summary @Irvise, very thorough.

I am surprised as to your statement that the open models are terrible at Ada. The Frontier models are actually very good, so that is likely a gap that will be closed in the next 6 months maybe? Hard to tell.

Have you been using the AdaCore provided skills? How would you suggest integrating your learnings into those skills? We VERY MUCH appreciate collaboration in from that perspective.

As to your approach with skills: yes, that is what we do too. Use an LLM to do something, see it struggle, then have it summarize and manually edit the skill. Important with skills is progressive disclosure, not giving the LLM all the information at the beginning, but to feed it information as it needs it through references.

Where do you run your models? Do you have your own machine with GPU to run them on, or do you use a cloud instance of these open models?

Another learning is that context matters. The agentic workflow where you use agents to improve small parts of a program works best, but this of course does not scale to all workflows.

I need more time to sit down and go through your post in more detail as there is a lot of good content in there. Let’s dedicate the next office hours instance (June 19th) on the topic as to how people are using AI and then we can see how we can dig deeper into that content.

I would like to participate to the session, but it’s during my working hours…

That is my guess. From some comments that have been raised here in the forums, it seems that models are becoming better and better with time, specially with Ada. From my small testing time, it does seem that reasoning models can learn it quite fast even if they don’t have prior knowledge of some Ada parts.

I have not for two reasons. The simplest one, I am trying to learn, so no help there desired. Secondly, because I did not see an easy way of doing so. Since your post I have learned that my tool (OpenStudio) can detect skills for mayor AIs and also feed them to others, so I now know that I can so somewhat easily. Nevertheless, I will use them in the future and report back. I want to see those skill improve.

Good question! The documents that were created by the AIs are very AI friendly and formatted, not so much human-like. Your documents seem more human written. I suppose the style would not matter much though. Maybe I could open a PR with a reviewed document generated for the AI and then we can discuss what we should change? Also, the AI generated documents have references to the project I am using them on and I would need to clean that too…

As you and other people are part of AdaCore, I would highly recommend that you try to feed an AI all the documentation of each tool and maybe also code (both public, semi-public and maybe private) to see some real examples of the tools in use. Then ask it to generate a skill or summary document for each of those tools. That is more or less what I have done, but you probably have more “tokens” to burn and much much much more knowledge/documentation to feed the AI.

I am currently paying for the Go and Zen models in OpenCode. Go models do not run on US servers while Zen generally run on the US. OpenCode can also connect itself to your own self-hosted system (I have a friend doing that) or another hosted solution. I have seen that thanks to open models become more well-known and capable, there has been an explosion of companies that rent compute/GPU to allow the users to run those open models in their servers.

Sure, feel free to message me here or via email in case you would like any extra info. I think I will be available on the 19th, so I could probably do a 10 minute introduction or showcase and then we can have a discussion on it.

Best regards,
Fer

Sorry Heziode. There is another community event that @Irvise runs monthly that is typically on a Saturday.

That’s the Ada Monthly Meeting, typically on the first Saturday afternoon (European time) of the month. Due to the summer holidays, though, not sure when the next edition will be held.

Tomorrow, Friday June 12, the Ada community meets in-person in Västerås Sweden as well as online at the 3rd Ada Developers Workshop, colocated with AEiC2026…

Are these meetings recorded?

I can’t join either - that is on a popular family holiday is the Nordics

It would be nice to be able to watch them, when there is time

/Björn

The monthly meetups usually are: https://www.youtube.com/watch?v=hKBnZcU12cY&list=PLlAtvZuAzANYsE4Y5JxevLlqiaUtk82fL

Thanks, I’ll watch them during summer holidays.

But I referred - a bit unclearly - to the “Ada SPARK Office Hours

Are those recorded?

Hi Björn,

The past two have not been recorded, I will ask the audience next time if they mind being recorded.

Regards

Mark