Hey all — I wanted to share a project I’ve been working on that I think this community might find useful.
Steelman is a QLoRA fine-tune of Qwen2.5-Coder-14B-Instruct, trained specifically on compiler-verified Ada 2022 code. It runs locally (Ollama, llama.cpp, etc.) and doesn’t need a cloud API.
I built it because every frontier model I tried was genuinely bad at Ada. Claude, GPT, Gemini — they all produce code that looks plausible but won’t compile. So I started generating Ada pairs, verifying them with GNAT (-gnat2022 -gnatwa), and training on only the code that actually passes the compiler.
Results on a custom compilation benchmark (923 prompts):
-
Steelman R5: 68.6%
-
Claude Opus 4.6: 40.3%
-
Qwen base (untuned): 35.0%
-
Claude Sonnet 4.6: 27.5%
It also scores 47.1% pass@1 on HumanEval-Ada (MultiPL-E), up from 34.4% for the base model. As far as I can tell these are the first published Ada pass@1 results for any model.
I should be upfront — I’m not an Ada programmer by trade. I learned enough Ada to read and validate the generated code, and every training pair is compiler-verified, but I’m sure experienced Ada developers would spot style issues or patterns that the model picked up from its training data. Feedback from people who actually write Ada professionally would be incredibly valuable for improving the dataset.
The model, GGUF, and dataset are all on HuggingFace:
-
Model: https://huggingface.co/the-clanker-lover/steelman-14b-ada-v0.1
-
GGUF: https://huggingface.co/the-clanker-lover/steelman-14b-ada-v0.1-GGUF
-
Dataset: https://huggingface.co/datasets/the-clanker-lover/steelman-sft-ada
If anyone tries it and has feedback — especially on the quality of the Ada it produces — I’d love to hear it. The next training round is in progress and community input would directly improve the model.