SPARK Verified Postfix Expression Calculator

I got inspired by @streaksu’s Ironclad work and @OneWingedShark’s idea to write a verified seedForth interpreter in SPARK. That seemed a bit large as a starting SPARK project, so I wrote a SPARK verified postfix expression calculator.

I had a great time doing this, so I might try to fork that calculator into a simple Forth interpreter. I don’t have formal methods training, but I wrote up my experience if anyone is curious.

I would really appreciate any guidance on the current code or related post from those with formal methods experience. The only reference I have right now is McCormick/Chapin’s “Building High Integrity Applications with SPARK” book, but that has been very helpful so far.

19 Likes

Nice project! I’d love to see you continue ^^

Thank you! Could you please also add the prover output to the README in GitHub - pyjarrett/postfix_calc: A verified postfix calculator? This way, other non-Ada users can see what it looks like without having to install and run anything.

2 Likes

Very nice!
Glad I was somewhat inspiring.
(Reading the article now.)

Regarding " Floating point simplifications", what I’ve found useful in CoAP-SPARK for proving floating point properties is the COLIBRI prover. It doesn’t come with the Alire distribution of gnatprove, but can be separately obtained from the GNAT Community 2021 edition. Then, just putting the binary on the path is enough to make gnatprove find it and use it when requested by the arguments. This workflow automating the proof process of my project is doing the downloand and extraction from GNAT Community.

1 Like

Hi Manu and everybody else,

you can download the Colibri and Colibri2 (still highly in beta) from their website: Colibri

Their repositories (where the binaries come from) are the following:

I am pointing this out as they are open source projects that have seen quite a bit of development since the GNAT Community pack was discontinued. I believe it may be also much easier to download the binaries (or compressed binary) directly and just put it into the Path. I have tested Colibri and it did not do much better than the others, but as @mgrojo points out, each solver has their strengths, and Colibri’s lies in numeric stuff rather than program flow or similar topics.

I hope this helps you all!
Best regards,
Fer

1 Like

Using the latest Colibri or Colibri2 releases from the repositories, as you mention, was my first approach, but they aren’t really compatible with gnatprove yet, as AdaCore confimed me through the GNAT Academic program. But the tool fails silently and you don’t see any difference. The working version is Colibri 2020.9, but building from source is not trivial. I finally discovered that the correct version was present in the latest Community Edition. The license is LGPL since 2020, so there are no possible worries about using or distributing it.

1 Like

Oh, that explains a few things I saw during my testing…

That is actually a bit surprising to me as all the tools are using Why3 and there are already four provers supported (Z3, alt-ergo, cvc5 and Coq with some tweaks…). Oh well… That is quite sad :confused: I know however that AdaCore was looking/working in Colibri, which is fully confirmed by the fact that it was distributed with the Community Edition…

1 Like

Sorry to necro-post,

I did some testing today with Colibri… and… I managed to make v1 work, and in its latest version 2025.06!! Colibri2 also works, but it is increeeeeeeeeeeeeeedibly slow. There are many options that Colibri2 has that may make it much much faster, but I cannot tell, I have not played around with them nor will I. Nevertheless, their proving performance is quite bad with SPARKNaCl, so I do not see any benefit in them compared to CVC5, Alt-Ergo or Z3. Maybe they only shine in their very niche domain…

Anyhow, trying Colibri2 is easy. Getting it to work just required updating the Why3 database (not as indicated in the SPARK/GNATprove documentation, as it seems to be out-of-date) and passing the .why3.conf file to gnatprove as documented. Here is the result of the proof:

=========================
Summary of SPARK analysis
=========================

------------------------------------------------------------------------------------------------------------
SPARK Analysis results        Total         Flow                           Provers   Justified      Unproved
------------------------------------------------------------------------------------------------------------
Data Dependencies               198          198                                 .           .             .
Flow Dependencies                 1            1                                 .           .             .
Initialization                  462          357                     29 (colibri2)           .            76
Non-Aliasing                      .            .                                 .           .             .
Run-time Checks                1179            .    242 (Trivial 5%, colibri2 95%)           .           937
Assertions                      337            .    38 (Trivial 55%, colibri2 45%)           .           299
Functional Contracts            164            .    62 (Trivial 16%, colibri2 84%)           .           102
LSP Verification                  .            .                                 .           .             .
Termination                     113          103                                 .           .            10
Concurrency                       .            .                                 .           .             .
------------------------------------------------------------------------------------------------------------
Total                          2454    659 (27%)                         371 (15%)           .    1424 (58%)


max steps used for successful proof: 2

============================
Most difficult proved checks
============================

No check found with max time greater than 1 second

and the command invocation was gnatprove --why3-conf="/home/fernando/.why3.conf" --level=4 --counterexamples=off -j14 --prover=colibri2 --report=statistics -d --verbose --debug-save-vcs --steps=5000. Bumping the allowable steps to 20000 increases the proof time substantially and it returns a better result, but the time cost is so baaaaad…

=========================
Summary of SPARK analysis
=========================

------------------------------------------------------------------------------------------------------------
SPARK Analysis results        Total         Flow                           Provers   Justified      Unproved
------------------------------------------------------------------------------------------------------------
Data Dependencies               198          198                                 .           .             .
Flow Dependencies                 1            1                                 .           .             .
Initialization                  462          357                     47 (colibri2)           .            58
Non-Aliasing                      .            .                                 .           .             .
Run-time Checks                1179            .    537 (Trivial 4%, colibri2 96%)           .           642
Assertions                      337            .    72 (Trivial 29%, colibri2 71%)           .           265
Functional Contracts            164            .    83 (Trivial 15%, colibri2 85%)           .            81
LSP Verification                  .            .                                 .           .             .
Termination                     113          103                      4 (colibri2)           .             6
Concurrency                       .            .                                 .           .             .
------------------------------------------------------------------------------------------------------------
Total                          2454    659 (27%)                         743 (30%)           .    1052 (43%)


max steps used for successful proof: 2

============================
Most difficult proved checks
============================

sparknacl-sign.adb:339:16: loop invariant after first iteration proved in max 2 seconds
sparknacl-hashing-rfsb509.adb:152:34: use of an uninitialized variable proved in max 2 seconds
sparknacl-hashing-rfsb509.adb:146:34: range check proved in max 1 seconds
sparknacl-hashing-rfsb509.adb:146:41: overflow check proved in max 1 seconds
sparknacl-car.adb:117:50: use of an uninitialized variable proved in max 1 seconds
sparknacl-car.adb:118:22: use of an uninitialized variable proved in max 1 seconds
sparknacl-mac.adb:107:39: overflow check proved in max 1 seconds
sparknacl-car.adb:80:22: assertion proved in max 1 seconds
sparknacl-car.adb:119:22: use of an uninitialized variable proved in max 1 seconds
sparknacl-mac.adb:107:39: index check proved in max 1 seconds

But now… How did I make a modern version of colibri v1 work? Well, nothing that I tried worked. But then I had the idea of substituting the executable name of colibri2 by colibri in the gnatprove.conf file. I hoped that that would trick gnatprove into thinking it was calling colibri2, a purely external solver, but in reality, it would be calling colibri under the hood… And my hopes were answer successfully!! The colibri executable was called without issue and it ran some proofs. So it seems that some of the internals of gnatprove are messing the execution or calling of colibri v1… Here are the results.

NOTE, I WOULD SWEAR THIS ACTUALLY WORKED, BUT AFTER TRYING TO REPEAT THE RESULTS, I CANNOT MAKE IT WORK ANYMORE ;-;


Just to put things into perspective, these are the results generated by solely using Alt-Ergo, which was not much faster:

=========================
Summary of SPARK analysis
=========================

---------------------------------------------------------------------------------------------------------
SPARK Analysis results        Total         Flow                           Provers   Justified   Unproved
---------------------------------------------------------------------------------------------------------
Data Dependencies               198          198                                 .           .          .
Flow Dependencies                 1            1                                 .           .          .
Initialization                  462          357      98 (Trivial 3%, altergo 97%)           .          7
Non-Aliasing                      .            .                                 .           .          .
Run-time Checks                1179            .    1153 (Trivial 2%, altergo 98%)           .         26
Assertions                      337            .     301 (Trivial 6%, altergo 94%)           .         36
Functional Contracts            164            .     157 (Trivial 8%, altergo 92%)           .          7
LSP Verification                  .            .                                 .           .          .
Termination                     113          103                      10 (altergo)           .          .
Concurrency                       .            .                                 .           .          .
---------------------------------------------------------------------------------------------------------
Total                          2454    659 (27%)                        1719 (70%)           .    76 (3%)


max steps used for successful proof: 131427

============================
Most difficult proved checks
============================

sparknacl-sign.adb:446:13: assertion proved in max 10 seconds
sparknacl-sign.adb:557:16: assertion proved in max 9 seconds
sparknacl-sign.adb:577:16: loop invariant after first iteration proved in max 9 seconds
sparknacl-sign.adb:535:16: assertion proved in max 8 seconds
sparknacl-sign.adb:546:16: assertion proved in max 7 seconds
sparknacl-sign.adb:489:19: loop invariant after first iteration proved in max 7 seconds
sparknacl-sign.adb:512:28: assertion proved in max 6 seconds
sparknacl-sign.adb:481:28: range check proved in max 5 seconds
sparknacl-sign.adb:520:16: assertion proved in max 5 seconds
sparknacl-sign.adb:523:28: assertion proved in max 5 seconds

========================

I also tried the yices-smt2 prover, which I have seen used in the FPGA world quite a bit. I just required the detection of why3 and adding it to the <gnatprove-dir>/share/spark/gnatprove.conf file so that GNATprove would not complain that it is an unknown prover. The results are quite underwhelming, but it makes sense, as it was designed specifically with computer architectures/FPGAs in mind…

I hope this sheds some light on the topic. Best regards,
Fer

1 Like