Sorry to necro-post,
I did some testing today with Colibri… and… I managed to make v1 work, and in its latest version 2025.06!! Colibri2 also works, but it is increeeeeeeeeeeeeeedibly slow. There are many options that Colibri2 has that may make it much much faster, but I cannot tell, I have not played around with them nor will I. Nevertheless, their proving performance is quite bad with SPARKNaCl, so I do not see any benefit in them compared to CVC5, Alt-Ergo or Z3. Maybe they only shine in their very niche domain…
Anyhow, trying Colibri2 is easy. Getting it to work just required updating the Why3 database (not as indicated in the SPARK/GNATprove documentation, as it seems to be out-of-date) and passing the .why3.conf file to gnatprove as documented. Here is the result of the proof:
=========================
Summary of SPARK analysis
=========================
------------------------------------------------------------------------------------------------------------
SPARK Analysis results Total Flow Provers Justified Unproved
------------------------------------------------------------------------------------------------------------
Data Dependencies 198 198 . . .
Flow Dependencies 1 1 . . .
Initialization 462 357 29 (colibri2) . 76
Non-Aliasing . . . . .
Run-time Checks 1179 . 242 (Trivial 5%, colibri2 95%) . 937
Assertions 337 . 38 (Trivial 55%, colibri2 45%) . 299
Functional Contracts 164 . 62 (Trivial 16%, colibri2 84%) . 102
LSP Verification . . . . .
Termination 113 103 . . 10
Concurrency . . . . .
------------------------------------------------------------------------------------------------------------
Total 2454 659 (27%) 371 (15%) . 1424 (58%)
max steps used for successful proof: 2
============================
Most difficult proved checks
============================
No check found with max time greater than 1 second
and the command invocation was gnatprove --why3-conf="/home/fernando/.why3.conf" --level=4 --counterexamples=off -j14 --prover=colibri2 --report=statistics -d --verbose --debug-save-vcs --steps=5000. Bumping the allowable steps to 20000 increases the proof time substantially and it returns a better result, but the time cost is so baaaaad…
=========================
Summary of SPARK analysis
=========================
------------------------------------------------------------------------------------------------------------
SPARK Analysis results Total Flow Provers Justified Unproved
------------------------------------------------------------------------------------------------------------
Data Dependencies 198 198 . . .
Flow Dependencies 1 1 . . .
Initialization 462 357 47 (colibri2) . 58
Non-Aliasing . . . . .
Run-time Checks 1179 . 537 (Trivial 4%, colibri2 96%) . 642
Assertions 337 . 72 (Trivial 29%, colibri2 71%) . 265
Functional Contracts 164 . 83 (Trivial 15%, colibri2 85%) . 81
LSP Verification . . . . .
Termination 113 103 4 (colibri2) . 6
Concurrency . . . . .
------------------------------------------------------------------------------------------------------------
Total 2454 659 (27%) 743 (30%) . 1052 (43%)
max steps used for successful proof: 2
============================
Most difficult proved checks
============================
sparknacl-sign.adb:339:16: loop invariant after first iteration proved in max 2 seconds
sparknacl-hashing-rfsb509.adb:152:34: use of an uninitialized variable proved in max 2 seconds
sparknacl-hashing-rfsb509.adb:146:34: range check proved in max 1 seconds
sparknacl-hashing-rfsb509.adb:146:41: overflow check proved in max 1 seconds
sparknacl-car.adb:117:50: use of an uninitialized variable proved in max 1 seconds
sparknacl-car.adb:118:22: use of an uninitialized variable proved in max 1 seconds
sparknacl-mac.adb:107:39: overflow check proved in max 1 seconds
sparknacl-car.adb:80:22: assertion proved in max 1 seconds
sparknacl-car.adb:119:22: use of an uninitialized variable proved in max 1 seconds
sparknacl-mac.adb:107:39: index check proved in max 1 seconds
But now… How did I make a modern version of colibri v1 work? Well, nothing that I tried worked. But then I had the idea of substituting the executable name of colibri2 by colibri in the gnatprove.conf file. I hoped that that would trick gnatprove into thinking it was calling colibri2, a purely external solver, but in reality, it would be calling colibri under the hood… And my hopes were answer successfully!! The colibri executable was called without issue and it ran some proofs. So it seems that some of the internals of gnatprove are messing the execution or calling of colibri v1… Here are the results.
NOTE, I WOULD SWEAR THIS ACTUALLY WORKED, BUT AFTER TRYING TO REPEAT THE RESULTS, I CANNOT MAKE IT WORK ANYMORE ;-;
Just to put things into perspective, these are the results generated by solely using Alt-Ergo, which was not much faster:
=========================
Summary of SPARK analysis
=========================
---------------------------------------------------------------------------------------------------------
SPARK Analysis results Total Flow Provers Justified Unproved
---------------------------------------------------------------------------------------------------------
Data Dependencies 198 198 . . .
Flow Dependencies 1 1 . . .
Initialization 462 357 98 (Trivial 3%, altergo 97%) . 7
Non-Aliasing . . . . .
Run-time Checks 1179 . 1153 (Trivial 2%, altergo 98%) . 26
Assertions 337 . 301 (Trivial 6%, altergo 94%) . 36
Functional Contracts 164 . 157 (Trivial 8%, altergo 92%) . 7
LSP Verification . . . . .
Termination 113 103 10 (altergo) . .
Concurrency . . . . .
---------------------------------------------------------------------------------------------------------
Total 2454 659 (27%) 1719 (70%) . 76 (3%)
max steps used for successful proof: 131427
============================
Most difficult proved checks
============================
sparknacl-sign.adb:446:13: assertion proved in max 10 seconds
sparknacl-sign.adb:557:16: assertion proved in max 9 seconds
sparknacl-sign.adb:577:16: loop invariant after first iteration proved in max 9 seconds
sparknacl-sign.adb:535:16: assertion proved in max 8 seconds
sparknacl-sign.adb:546:16: assertion proved in max 7 seconds
sparknacl-sign.adb:489:19: loop invariant after first iteration proved in max 7 seconds
sparknacl-sign.adb:512:28: assertion proved in max 6 seconds
sparknacl-sign.adb:481:28: range check proved in max 5 seconds
sparknacl-sign.adb:520:16: assertion proved in max 5 seconds
sparknacl-sign.adb:523:28: assertion proved in max 5 seconds
========================
I also tried the yices-smt2 prover, which I have seen used in the FPGA world quite a bit. I just required the detection of why3 and adding it to the <gnatprove-dir>/share/spark/gnatprove.conf file so that GNATprove would not complain that it is an unknown prover. The results are quite underwhelming, but it makes sense, as it was designed specifically with computer architectures/FPGAs in mind…
I hope this sheds some light on the topic. Best regards,
Fer