Industry Commentary

Who Owns the Diff When an Agent Writes It?

By John Jansen · · 7 min read

Share

The legal scaffolding around software was built on a quiet assumption: a human typed the code. Copyright vests in authors. Contributor License Agreements ask individuals (or their employers) to warrant originality and grant rights. Open source licenses flow from a chain of authorship that begins with a person making expressive choices. The whole edifice — from the DCO sign-off to enterprise IP indemnification clauses — presumes an identifiable human author whose work product can be attributed, warranted, and assigned.

That assumption is degrading fast. On several of our internal services, agent-authored diffs already outnumber human-authored ones. The human role has shifted toward review, scoping, and architectural decisions; the keystrokes belong to the model. We don't think this is unusual — it's just earlier than most teams will admit publicly. And it surfaces a set of questions that procurement, legal, and OSS maintainers should be answering now, before the majority of commits flowing through their systems are ones no human typed.

What the existing frameworks actually require

It's worth being precise about what current frameworks demand, because the friction points aren't always where people assume.

A standard CLA (Apache's, for example) asks the contributor to certify that the contribution is their original creation, that they have the right to grant the license, and that they're aware of the license terms. The Developer Certificate of Origin — the lighter-weight alternative used by the Linux kernel and many others — asks the contributor to assert one of three things: they wrote it, it was contributed under a compatible license, or it was provided to them by someone who made one of those assertions.

Neither contemplates a contribution where the expressive choices were made by a statistical model trained on a corpus of unknown provenance. The US Copyright Office's current position — that purely AI-generated output isn't copyrightable absent meaningful human authorship — means agent-authored code may sit in a strange limbo: contributed under a license that presumes copyright, but where the copyright itself is questionable.

For enterprise procurement, the analogue is the IP indemnification clause. Vendors warrant that their deliverables don't infringe third-party IP and agree to defend the buyer if a claim arises. When the deliverable was authored by an agent trained on code with mixed licensing, that warranty becomes harder to give honestly — and harder to price.

Three failure modes that are already happening

The provenance gap. A contributor opens a PR. The DCO sign-off is present. The code works. But the contributor used an agent that may have reproduced a non-trivial chunk of GPL-licensed training data verbatim. The maintainer has no practical way to detect this, and the contributor may not know either. The sign-off is technically false but in good faith. When the project later gets a license-compliance complaint, the chain of attribution doesn't help anyone.

The warranty gap. An enterprise buys a software product. The vendor's engineers used coding agents extensively. The vendor's IP indemnification clause was drafted in 2019 and assumes human authors trained on the company's own codebase. When a third party claims a snippet matches their proprietary code, the vendor's insurer asks how the code was produced. The answer — "an agent wrote it, our engineer reviewed and merged it" — wasn't contemplated when the policy was underwritten.

The attribution gap. A maintainer wants to enforce a copyleft license. To do so they need to demonstrate ownership of the copyright in the work. If a meaningful share of the codebase was authored by agents, and agent output isn't copyrightable, the enforceable surface of the license shrinks. This isn't theoretical — it's the natural consequence of the Copyright Office's stated position applied to a codebase where authorship is mixed.

What we think needs to change

The instinct in some quarters is to ban agent-authored contributions. Some projects have tried this. We think it's a losing position — not because it's wrong in principle, but because it's unenforceable. You cannot reliably detect agent-authored code, and the line between "the agent wrote it" and "the human used the agent as autocomplete" is not a line that survives contact with how people actually work.

A more honest path has three parts.

Disclosure, not prohibition. CLAs and DCOs should be amended to require contributors to disclose when an agent materially authored the contribution, and to warrant what they can warrant: that they reviewed the output, that they tested it, and that to their knowledge it doesn't reproduce identifiable third-party code. This shifts the warranty from "I wrote this" to "I am responsible for this" — which is closer to what review-based contribution has always actually meant, but makes the responsibility explicit.

Provenance metadata in the commit. Git already supports trailers. There's no reason a commit can't carry structured metadata indicating which agent produced the diff, what model version, what prompt context, and what review steps the human took. We've started doing this internally. It costs nothing, it makes audit possible, and it gives downstream consumers something to filter on. The SPDX and SLSA communities are adjacent to this problem; the gap is a standard trailer format that tools can emit by default.

Indemnification that priced-in agents. Enterprise vendors should stop pretending their engineers don't use agents and start offering indemnification that explicitly covers agent-authored code, with the price reflecting actual risk. The vendors who can do this — because they've invested in training-data auditing, output filtering, and provenance tracking — will have a real commercial advantage over those who hand-wave. Buyers should ask the question directly in RFPs: how is agent-authored code in your deliverables tracked, reviewed, and indemnified?

The harder question for OSS

The enterprise side will sort itself out, because money concentrates the mind. The OSS side is harder, because maintainers are unpaid and the tools to enforce any new policy don't exist.

Here's the position we'd take if we maintained a major project today: we'd treat agent-authored contributions as a first-class category, require disclosure, and make the human contributor — not the agent — the warranting party. The DCO already does most of this work; it just needs a fourth clause acknowledging that the contributor may have used tools that authored substantial portions, and that the contributor has reviewed the output and accepts responsibility for it as if they had written it themselves.

This is closer to how editorial responsibility works in publishing than how authorship works in copyright, and that's probably the right model. The maintainer doesn't need to know who typed each character. They need to know who is accountable for what gets merged.

Where this lands

The frameworks we have were built for a world where authorship and accountability were the same thing, vested in the same person. Agents have separated those two things. Authorship — in the sense of who made the expressive choices — increasingly belongs to the model. Accountability still has to belong to a human, because models can't be sued, can't sign CLAs, and can't be enjoined.

The practical work of the next two years is rebuilding the contracts, sign-offs, and procurement clauses around accountability rather than authorship. The projects and vendors that do this early will have cleaner audit trails, more defensible licenses, and less friction with their lawyers. The ones that wait until a court forces the question will find out what their existing paperwork is actually worth, which we suspect is less than they think.

Want to discuss this?

We write about what we're actually working on. If this is relevant to something you're building, we'd love to hear about it.