Disclosure of incorporation of LLM outputs in (e.g., Alire-indexed) projects

waleedmebane · June 18, 2026, 2:22pm

There are currently some open-source projects banning or restricting the contribution of code containing LLM outputs or requiring disclosure of its presence in such contributions. See Zig and OpenJDK for bans, and Linux and Julia for disclosure policies. A draft policy requiring disclosure has been proposed for Rust. A blog post notices that AI content disclosure might be required in the EU in August and proposes that the OCaml package repository incorporate a text field for projects to voluntarily disclose AI use. It was further discussed on the OCaml discourse forum.

Projects that do not wish to incorporate dependencies on LLM outputs or that want to document such dependencies would be facilitated by having a field in Alire or other convention for disclosure (e.g., elsewhere than Alire).

I couldn’t find something like that with a search of the Alire documentation and Github repo or this forum, but I have also not used Alire to distribute any software, so I could have missed something obvious wrt Alire. If so, I apologize. Otherwise, what are your opinions about how this could best be accomplished and how disclosure could be encouraged? Thanks!

cantanima · June 18, 2026, 2:33pm

How would it be enforced? Is AI-generated content always that easy to detect?

What if it’s AI-generated, then modified by the programmer? For example, I used an LLM to generate some Ada code for me a month or two ago, then went over it pretty carefully, changing variable names to something more meaningful, adding comments, etc.

My concern is that any policy being suggested is essentially an honor policy; i.e., practically unenforceable. I’d be happy to know if I’m wrong.

waleedmebane · June 18, 2026, 2:53pm

I wasn’t presuming that it would be required. It could also be encouraged or requested but, in any case, enabled/facilitated by a convention. I would be in favor of requiring it, but I don’t expect that it could be enforced reliably.

If the result would be considered derived from the LLM output, such as in your example but also in examples in which more than variable names were changed, even if none of the verbatim output remains, I would wish to see that disclosed. It does not have to use the same text/tag as for verbatim outputs if that is a concern.

waleedmebane · June 18, 2026, 3:21pm

I think that an honor policy is a good approach in this context. The package or project creators only need to disclose, and the primary risk to them from disclosure is that people who prefer not to use software incorporating LLM outputs might not use their software.

ThyMYthOS · June 18, 2026, 6:44pm

What about a general rating system? Much more than contributions from LLMs I’m interested in the general quality of crates.

Heziode · June 19, 2026, 4:58am

On my side, I currently use a disclosure in the README: GitHub - adarium-labs/termicap: Cross-platform terminal capability detection for Ada/SPARK — TTY, color, size, Unicode, terminal identity · GitHub

## AI use in this project

For transparency: a substantial part of this codebase, including Ada source, tests, examples, ADRs, requirements, and most of this README, was drafted with the help of generative-AI tools, primarily Claude Opus. The detection algorithms themselves aren't model inventions; they come from a survey of established terminal-capability libraries in other ecosystems (`supports-color` and `terminal-size` in Rust/Node.js, `termenv` in Go, `rich` and `blessed` in Python, `chafa` in C, `JLine` in Java, plus a few more), cross-checked against published terminal specifications and consolidated into the StrictDoc requirements set under [`docs/requirements/`](docs/requirements/). What the AI was used for is the translation of that prior art into idiomatic Ada/SPARK and the scaffolding of tests and docs around it.

Every committed change is reviewed, tested, and owned by a human. The AI is treated as a power tool, not as an author. The same standard is expected from outside contributions; see below.

Plus a sub section in the contributing section:

### AI-assisted contributions

You may use generative-AI tools (Claude Code, Copilot, Codex, Cursor, anything else) when writing code, tests, docs, or commit messages. The rules are short:

1. **You own the code you submit.** If you opened the PR, you're on the hook for understanding every line of it, for testing it, for getting it past the test suite and the project's coding standard ([`.docs/ada-style-guide.md`](docs/ada-style-guide.md)). *"The model wrote it"* is not an answer to a review comment.
2. **PRs opened directly by an AI agent or autonomous bot will be closed without review.** The only exception is automation a maintainer of this repository runs on their own initiative.
3. **Disclose substantial AI involvement** in the PR description. A one-line note is enough, e.g. *"drafted with Claude Opus, reviewed and adjusted by hand"*. Trivial autocomplete or rename suggestions don't need disclosure; whole functions, generated tests, large refactors, or new architecture do.
4. **Watch the licensing.** Make sure the AI tool's terms and any training-data attribution constraints are compatible with the project's Apache-2.0 WITH LLVM-exception license. If you're unsure, don't submit it.

If a PR can't survive these rules, please don't open it. It saves everyone time.

Maybe adding a field like description but for explaining the use of AI in the project ?

jcmoyer · June 19, 2026, 12:51pm

I am not a lawyer, but I don’t think that LLM output can be licensed if the code the model was trained on has incompatible licenses. I would personally never depend on code with dubious licensing, so I think it should be disclosed.

Yes, it’s an honor system, and yes people are getting away with license laundering today. It will not necessarily be that way in the future. Regulators are slow to move and it will probably take some high profile legal battles to figure it out.

krischik · June 20, 2026, 6:51am

I’m using GPL and don’t plan to change the license. I’m concerned that the proposed rules could unintentionally exclude GPL-licensed projects.

This raises a broader question: do we want to add more restrictions that might limit contributions?

Right now there are 15 open pull requests in the alire-index backlog, some over a month old. The moderators already have a lot on their plate reviewing existing rules — adding new ones could make that even harder.

As context, the Pi-Ada-Tutorial I’m working on relies heavily on AI assistance because I don’t have enough free time outside my regular job. Changes like this could make it harder for similar community-driven projects to be included, or even push people toward maintaining their own index forks.

A small community like ours might struggle with the overhead of forks or lost projects.

PS: Should we include the actual owner of the index (@mosteo) in discussions about significant policy changes?

Heziode · June 20, 2026, 8:20am

I do not see the probem, can you elaborate please

This constraint is ligbter ebough to allow a wide range of contribution, IMHO. It only refuse automatic workflow without human-in-the-loop.

krischik · June 20, 2026, 3:04pm

Just to clarify: automated workflows have never been proposed for alire-index. We rely on human review by volunteers, which is why we’re actively trying to get more people involved (and why the new volunteer guidelines are being worked on).

I think the specific license requirements you suggested come from what works well for your own project (Apache-2.0 with LLVM exception is a very business-friendly choice). However, when GPL projects see their license explicitly called out or restricted, it can come across as quite discouraging.

Since alire-index is meant to be a general index for the Ada community, we should be careful not to unintentionally exclude valid contributions. @mosteo has been quiet lately (new job + newborn), so we’re trying to handle this carefully while we still have limited reviewer bandwidth.

Heziode · June 20, 2026, 5:58pm

Oh damn!

Sorry Martin, I think there’s a misunderstanding.

The example I posted from my Termicap project is… an example. It just to give some lines about how, me, in my case (an Alire user / crate “creator”), I disclose the use of AI and how I “regulate” it in my project.

The things about Apache-2.0 WITH LLVM-exception license and control of non automated workflow is specific to my project Termicap. It is not intended to be used “as-is” by Alire.
For Termicap, and some of my future projetcs, I chosen Apache-2.0 WITH LLVM-exception license instead of GPL to be business friendly, which is so very restrictive for dependencies of my project, since I cannot depend on a GPL licence.

mosteo · June 20, 2026, 6:15pm

I agree with @cantanima that the best we can aspire to is a honor system. Have a boolean flag that an LLM was involved, or maybe define a reserved tag, “LLM-tainted” or so.

I like @Heziode policies about owning the code you submit, and requiring a responsible human for contributions.

Finally, we do plan on moving towards more automated acceptance of releases, at least in some cases, but we are not in a hurry and we will want to carefully consider how to do it. The Ada community being relatively small is a curse but also a blessing in that we can manage things more carefully.

In the end there may be no alternative but to open the floodgates and maybe have a tiered system with a fully automated index and a slower, human-curated one. We’ll see. And of course I’m happy to see all discussion on how to improve Alire.

krischik · June 21, 2026, 8:28am

I agree that some form of disclosure makes sense, especially for high-integrity or safety-critical Ada projects.

However, a simple boolean flag might not be very useful in practice. AI can be involved in many different ways: generating documentation or comments, code completion suggestions, research/examples, or writing significant portions of the actual code. A single flag wouldn’t capture that nuance.

We’d probably need either multiple flags or a small enumerated set of categories. Free-text fields would be hard to filter automatically, which I assume is part of the goal.

Personally, I believe a strict “no AI allowed” rule would significantly hurt productivity for many contributors. At the same time, I understand why projects in high-integrity domains need clear disclosure.

The important question is: what should the default be, especially for existing crates in the index? Requiring everyone to go back and declare AI usage (or the lack of it) could create a lot of extra work.

waleedmebane · June 21, 2026, 2:59pm

what should the default be, especially for existing crates in the index?

How about something like “Not recorded” or “Unknown” as the initial value for existing crates?

mosteo · June 22, 2026, 10:57am

Yes. I’ve been reading on the topic since my last post, and the OCaml approach with a few steps like none, assisted, generated, unsupervised (plus unknown) is interesting. But again there’s lots of nuance that are left to the honesty/willingness of the submitter.

It’s true that we could require it just to be out of the hook and push the responsibility onto the submitter. Some communities are doing it for that reason, just like when you declare going through customs: of course you can lie and nobody will notice, but if you’re caught after the fact, then it piles on you.

How about something like “Not recorded” or “Unknown” as the initial value for existing crates?

I don’t see any other realistic possibility.

I expect the legal field to be in flux for some years, and there’s the risk of differences between e.g. USA/EU law. So I don’t think we should try to anticipate everything beyond what’s useful and what people demands now.

BTW I also tried to understand the disclosure the EU will start demanding shortly that someone pointed upthread, but IIUC, that only affects providers of AI services.

HawkerFrake · June 22, 2026, 3:54pm

Did anyone notice an increase of higher quality contributions in the last year or so? If not I’d say there’s nothing to lose here.

LionelDraghi · June 22, 2026, 4:16pm

Please forgive my possible naivety, but I don’t really see the point of banning, or even just requiring prior disclosure of, the use of an LLM – at least not until there are concrete legal reasons to do so.

One can certainly make pragmatic arguments: for example, we simply cannot reliably verify whatever people might declare about their use of LLMs.

But more fundamentally, I don’t see the actual problem we’re trying to solve. Today, we don’t distinguish between a “good” and a “bad” crate in any formal way, and we would struggle to define “good vs. bad” objectively. Has a crate ever been rejected solely because its code was deemed “bad”?

So what exactly are we afraid of? Being flooded with tons of new crates because it suddenly becomes so easy to generate them? I find that rather unlikely. We already have multiple crates that address the same need, written by humans with significant effort, while we are a small community with limited resources that doesn’t really have the luxury to spread itself too thin. Do we really want to start limiting that now?

Personally, I don’t mind whether AI was used or not. What matters to me is the “quality” of what is being published. From what I’ve seen, there is no clear correlation between quality and the use of AI. There is a correlation between the quality of the result and the experience of the person steering the design — but that’s true with or without AI.

For that reason, I wouldn’t make the crate acceptance process more complex, at least not until a real problem emerges.

krischik · June 23, 2026, 7:06am

Thanks you for calling my contributions „nothing of value.“ - And maybe they are. They are just hobby projects and tutorials.

Fabien.C · June 23, 2026, 8:39am

I think @HawkerFrake was asking a genuine question, and not targeting anyone here.

mosteo · June 23, 2026, 9:35am

I don’t think it’s so much a matter of quality. It’s more ideological or for possible legal repercussions down the line. So I see it more in the frame of licenses. Of course the crucial difference is that anyone can use whatever license for their code if written from scratch. Here we are talking about disclosing factual information voluntarily.