How to easily use AI Skills with your Ada projects

Hi Mark and all,

I wanted to bring this topic during the SPARK office hours, but we ran out of time (for a good reason!). I have been working with AI and Ada/SPARK for about a month now. I have mostly used open-source models (so no GPT, Claude nor Gemini). The ones I have used most were GLM 5.1, DeepSeek v4, Qwen 3.6 (3.7 a bit too, but that is not open!), KiMi, MiMo and more recently Nemotron 3 Ultra. I would like to share my experience with these models (all used through https://opencode.ai/)

  1. They are all terrible at idiomatic Ada. The most modern ones and most capable ones are substantially better. I would assume that leading models (GPT, Gemini, Claude) are even better, but I have not tried.
    1. They are pretty good at writing basic Ada 95/2005, but they always fail with modern Ada features. Some examples are: using Pragma instead of with aspects. Indicating that something is Pure in a comments, but not documenting it. They generally do not use Ada 2012 expression functions function XXX () is ();. They have some issues with how private works and the package hierarchy…
    2. They do not use modern Ada 2022 features even though they would produce much cleaner code. They also do not use aspects almost at all (Ada 2012 iterators, type annotations…).
    3. They are terrible at properly utilising the Ada Standard library. They are not that good at understanding that Strings can be created dynamically regardless of the size (within reason).
    4. They have to be explicitly told that not null constant access is the preferred way to use access. Though when told to use a similar model to rusts memory model, it tend to improve quite a lot.
    5. They suffer a lot of pitfalls that new users also make. For examples, using subtype everywhere instead of proper type yyy is new xxx; generating a lot of in out in functions, using when others => too often or defaulting to an if ... elsif ... else construct when it should not (or not using ranges in whens). They could easily create subtypes to help themselves with set theory but they do not…
  2. They tend to go into long changes/refactors and then when compilation fails they go on a long round-about way of fixing things. This consumes quite a lot of time and tokens, driving up the costs a lot.
  3. Only a few models can do proper SPARK proofs. GLM, Qwen and Nemotron can. DeepSeek v4 can, but only basic analysis and when guided. Other more basic models fail quite a bit in this area.
  4. They generally misuse Ada tools. They generally do not understand Alire, GPRbuild, etc. GPRbuild is specially annoying as AIs tend to duplicate A LOT of code/statements there and structure it very poorly.
    1. They are terrible with GNAT and GNATprove switches.
    2. They barely use GDB when asserts/exceptions happen. They go on long reasoning chains which again consume a lot of time and money. When told to use GDB properly, then tend to find the issue in no time…

After some time working with them, I found the following to be quite useful:

  1. Creating skills. Just like what @markhermeling has been doing, creating skills is a good way to help the AIs.
    1. As I do not have time to properly write skills/summaries, what I did is tell the AIs to read the GNAT, SPARK, GPRbuild, Alire and Ada 2022 documents from the internet and summarize them for their own consumption. This was done with several AIs, all doing the same task one after the other and further refining the resulting documents.
    2. This greatly helped with code quality, SPARK contracts and adding some modern features. However, I was expecting a much more substantial improvement… For example, the use of the Ada standard library did not improve, nor did code structuring…
    3. This kind of work only yields improvements if the AI can handle a lot of tokens/memory, this means that only AIs with contexts of >500k tokens can work (project specific documentation plus Ada’s). This is becoming more and more common thanks to 1M contexts being the new standard.
  2. Forcing styles/structure every time I saw something strange.
    1. For example, forcing the no when others unlike specified, not null constant access, etc…
    2. Explicitly state Ada 2022 use.
    3. Using several AIs to write and review code (AI orchestration).
  3. Telling AIs to also review projects such as SPARKlib to learn the different techniques, code styles, structures, etc; and document those too.
  4. Allowing them to have as much compiler/GNATprove feedback as possible and let them be autonomous
    1. This tends to create long running tasks and contexts, which is problematic in terms of time and token/money consumption.
    2. In some instances the AIs devolve into madness and get stuck quite severely. In this cases human intervention greatly helps.

In conclusion, I think things are good and rapidly improving. However, I think having a proper, curated set of modern Ada documents for AIs to consume is what truely helps with their lack of Ada/SPARK knowledge.

Would it be okay to create some skills for “best Ada 2022 practices” and “SPARK enablement” too?

Best regards,
Fer

P.S: I really liked the office hours, I was also super happy to see random new people join and willing to learn more and discover Ada/SPARK. It was great to have heavy hitters from AdaCore and the wider community sharing their vast knowledge! 10/10