Agent-Sourced: A Provenance Tag for the Agent Era
Reading Time:
Reading Time:
This is the short version of an idea I'm putting on record. I make the case more fully on the TwiceData blog, and in full — with the tier criteria, the transition and licensing rules, and a call to codify it in the open — in a white paper. I'm coining a term here: agent-sourced.
Open source has a trust problem, and it arrived faster than anyone was ready for. Through 2025 and into 2026, maintainers started drowning. curl ended its bug-bounty program in early 2026 after AI “slop” overwhelmed its security queue — not even one in twenty submissions was a genuine vulnerability. Zig adopted an outright no-LLM policy; Gentoo forbids contributions made with AI tools; NetBSD presumes LLM-generated code “tainted”; QEMU banned AI contributions on licensing grounds (a policy it’s now reconsidering); and GIMP and Flathub banned them outright. LLVM took the opposite tack — a human-in-the-loop policy that permits AI-assisted code so long as a person vouches for it.
It crystallized into a public fight. On one side, DHH: banning AI betrays open source’s founding mission — everyone’s right to change software. On the other, ThePrimeagen: the bans are triage, and quality has to stay with a person. Both are right, which is exactly why the argument doesn’t resolve. The frame forces a blunt binary: ban everything, or drown.
“Should agents be allowed to contribute” is the wrong question. The right one is how their contributions are labeled.
I spend a lot of my time in healthcare data, where provenance is not an abstraction. In clinical and biomedical software, “where did this come from, and who checked it” is the difference between a number you can act on and one you can’t. We already accept that a result is only as trustworthy as its lineage. The code agents are now writing deserves the same treatment — not a verdict on whether it’s allowed, but an honest label on where it came from.
Agent-sourced (n., adj.) — a change, project, or artifact created mostly autonomously by an agent (one or more, not necessarily a population), with human input extremely low. It is a provenance label, the agent-era analog to crowdsourced: it tells the audience what they’re looking at, so the work can be trusted, reviewed, and used accordingly.
That’s the whole move. When the producers of code change, the first thing you owe everyone downstream is honest provenance.
A tag does two things at once. It flags the contribution for deeper review, so a maintainer knows exactly where to look harder. And it still lets agent-driven work through — welcomed, not forbidden. Honest provenance instead of gatekeeping or blind faith.
It’s also the low-commitment option, which is why it could actually be adopted. Today a maintainer’s only moves are expensive: fully review and include a contribution, or ban the whole category. An agent-sourced tag is the lighter middle — you neither vet everything nor forbid everyone; you label, and let the label route attention. And crucially, it ships as an add-on, not another bot: a tag on a pull request or commit, not one more AI that fixes issues or reviews PRs. Those add load and noise — the very thing maintainers are banning. The tag changes almost nothing in the workflow; it just makes origin legible.
The useful distinction isn’t a long ladder of labels — it’s two tiers, and the gate between them.
Tier 1 — raw agent-sourced. The agent’s output as submitted, not yet vetted by a person. No one vouches for it. A maintainer can fully ignore it, guilt-free, or browse it when they have time.
Tier 2 — human-verified agent-sourced. A person has reviewed it and vouches for it — but it is still agent-sourced. Verification doesn’t erase provenance; both facts travel together (made by an agent, checked by a human). Tier 2 is a required, non-bypassable designator: nothing reaches a release until it has earned it.
Agent-sourced code is a double-edged sword, and the tag denies neither edge. It can hide bug-bombs — defects that, merged unverified, take weeks to dig out (exactly ThePrimeagen’s fear). It can also deliver deep, fast innovation (exactly DHH’s hope). The tag validates both: flag it, verify before you trust, and welcome it, don’t ban it. In practice the buffer is the fork — agent work lives on a tagged fork, quarantined from the base, and when something is genuinely good a human cherry-picks it into the base. The base stays clean, innovation stays free, and a person still holds the gate.
Concretely, agent-sourced is two things you can build. First, an identifier — the provenance tag itself, carried on a commit or pull request: a signed trailer, a label, a field a tool can read. Second, a verifier / PR framework — the machinery that promotes Tier 1 to Tier 2: who attests, what gets checked, and where the gate sits in the pull-request flow. That’s what makes it real rather than rhetorical.
The honest hard part is governance: who decides what is agent-sourced versus agent-assisted versus human, and who verifies it? The line is genuinely contested — and it shouldn’t be decreed by any one of us. A real standard would need attestation, a clear boundary between assisted and sourced, transition rules for when a human edits agent work, and licensing terms that travel with the tag — and it must be set by a body of industry leaders, the way the OSI defined “open source,” C2PA defined content provenance, and SPDX standardized license identifiers.
So consider this an invitation. If you maintain a project, steward a foundation, build the agents doing the contributing, or work the licensing side — let’s convene and define it. The open-source AI fight doesn’t have to end with a ban or a flood. It can end with graduated, verifiable trust: agent work welcomed, labeled, and reviewed in proportion to where it came from.
A fuller treatment is on the TwiceData blog, and the complete technical argument — tiers, transition, licensing, and the call to codify — is the Agent-Sourced white paper (also available as a PDF).
— Gyasi Sutton, MD, MPH