There are currently some open-source projects banning or restricting the contribution of code containing LLM outputs or requiring disclosure of its presence in such contributions. See Zig and OpenJDK for bans, and Linux and Julia for disclosure policies. A draft policy requiring disclosure has been proposed for Rust. A blog post notices that AI content disclosure might be required in the EU in August and proposes that the OCaml package repository incorporate a text field for projects to voluntarily disclose AI use. It was further discussed on the OCaml discourse forum.
Projects that do not wish to incorporate dependencies on LLM outputs or that want to document such dependencies would be facilitated by having a field in Alire or other convention for disclosure (e.g., elsewhere than Alire).
I couldn’t find something like that with a search of the Alire documentation and Github repo or this forum, but I have also not used Alire to distribute any software, so I could have missed something obvious wrt Alire. If so, I apologize. Otherwise, what are your opinions about how this could best be accomplished and how disclosure could be encouraged? Thanks!
How would it be enforced? Is AI-generated content always that easy to detect?
What if it’s AI-generated, then modified by the programmer? For example, I used an LLM to generate some Ada code for me a month or two ago, then went over it pretty carefully, changing variable names to something more meaningful, adding comments, etc.
My concern is that any policy being suggested is essentially an honor policy; i.e., practically unenforceable. I’d be happy to know if I’m wrong.
I wasn’t presuming that it would be required. It could also be encouraged or requested but, in any case, enabled/facilitated by a convention. I would be in favor of requiring it, but I don’t expect that it could be enforced reliably.
If the result would be considered derived from the LLM output, such as in your example but also in examples in which more than variable names were changed, even if none of the verbatim output remains, I would wish to see that disclosed. It does not have to use the same text/tag as for verbatim outputs if that is a concern.
I think that an honor policy is a good approach in this context. The package or project creators only need to disclose, and the primary risk to them from disclosure is that people who prefer not to use software incorporating LLM outputs might not use their software.