What Specs Own When Agents Write the Code

Something quietly significant is happening in the way agent-era teams organise their work. GitHub's spec-kit, Anthropic's agent skills format, the proliferation of AGENTS.md files, the cursor-rules and claude-rules repos trending each week — these aren't isolated tools. They're the early shape of a discipline. Spec-driven development, after years of being the thing serious teams half-did and never named, is becoming a recognisable practice with conventions, file formats, and an emerging vocabulary.

We've been building with this pattern for long enough to notice when it shifts from "nice habit" to "how the work is structured". And the shift is happening now, for a specific reason: when agents do most of the writing, the spec stops being documentation and starts being the thing you actually edit.

The inversion

The old contract between specs and code was simple. Code was the source of truth. Specs were a derived artefact — useful, often stale, sometimes ceremonial. You wrote the spec to align humans, then humans wrote the code, and the code was what actually ran. If the two diverged, code won by default because code was what shipped.

Agents invert that. When a coding agent can regenerate a module from a description in under a minute, the description starts carrying more weight than the generated artefact. The code becomes the derived thing — recompilable from intent. Not entirely, not yet, but enough that the centre of gravity has moved.

This is what spec-kit is quietly formalising. The repo treats the spec as the primary edit surface and the implementation as something you cycle back to the spec to change. It's not revolutionary on its own — Cucumber and literate programming and design-by-contract all pushed in this direction. What's different is that the regeneration loop now actually closes. You can change the spec and have a plausible implementation in minutes, not weeks. The economics of where to put your effort have changed.

What specs are good at owning

Not everything wants to live in a spec. The genuinely useful work right now is figuring out the boundary, and it's more interesting than "specs for behaviour, code for implementation".

Specs are good at owning intent and invariants. What is this system supposed to do, for whom, under what constraints. What must never happen. What the data means. These are things that change slowly, that humans need to agree on, and that agents need as grounding. A spec that captures "orders are immutable once paid" is doing real work — it constrains every future generation, it survives refactors, and it's the kind of thing a human reviewer can actually verify.

Specs are good at owning interface contracts. The shape of an API, the schema of a message, the structure of a config file. These are the joints of the system, where one component's assumptions meet another's. Pinning them in a spec — not just in code — means an agent regenerating one side has a fixed point to generate against.

Specs are good at owning acceptance criteria. Concrete, testable statements about what "done" looks like. This is where spec-driven development earns its keep with agents: a generation loop without a clear oracle is just guessing. If the spec says "a user with an expired token receives 401 and a Retry-After header", the agent has something to write against and something to verify against.

What code still owns

The temptation, watching the spec-kit wave, is to push everything upstream. Resist that. Code still owns things specs are bad at.

Code owns the actual behaviour under load. A spec can describe rate limits; only code, running, tells you whether the queue backs up at 3am. Performance characteristics, memory behaviour, the precise interleaving of concurrent operations — these emerge from implementation choices that no spec at any reasonable resolution will capture.

Code owns the integration surface with reality. Every system eventually meets a library with a quirky timeout, a database with a non-obvious lock escalation, a third-party API with undocumented behaviour on Tuesdays. The accumulated knowledge of "we do it this way because the alternative breaks in production" lives in code and in tests, and pretending it can be lifted into prose loses the very specificity that makes it useful.

Code owns the local decisions that don't matter globally. Whether a helper is a function or a method, whether you use a map or a switch, whether the variable is called result or out. Specs that try to govern this become noise. One of the failure modes we're already seeing in over-eager spec-driven repos is specs that have grown to mirror the code — at which point you have two copies of the implementation and twice the maintenance.

The architectural question

The real design work for agent-era teams is drawing this line deliberately, per system, and writing it down. Which decisions are spec-owned and survive regeneration? Which are code-owned and survive spec rewrites? Which are negotiable?

We've started treating this as an explicit artefact on projects — a short document, often half a page, that says: these invariants live in the spec, these interfaces live in schemas, these behaviours live in tests, and everything else is implementation detail the agent can revise freely. It sounds bureaucratic. In practice it's the difference between an agent loop that converges and one that drifts.

Without this boundary, two failure modes show up. The first is spec bloat: every bug fix gets lifted into the spec, the spec swells to match the code, and you've recreated the old problem of duplicate sources of truth with extra steps. The second is spec starvation: the spec stays thin, the agent fills the gaps with plausible-but-wrong assumptions, and the codebase accumulates inconsistencies that no single review catches because each generation looked fine on its own.

The spec-kit-style repos that are working tend to share a structure: a small number of high-signal documents (intent, invariants, interfaces), a larger set of executable specs (tests, schemas, examples), and a clear convention that code below a certain level is regenerable. The trending agent-skills repos add another layer — reusable capability descriptions that compose. This is starting to look like a stack: skills compose into agents, agents work against specs, specs are verified by tests, tests run against code. Each layer has a clear job.

Where this leaves teams

The practical move, if you're building seriously with agents right now, is to stop treating specs as documentation and start treating them as source. That means version control discipline, review processes, and the same engineering rigour you'd apply to any other artefact that the build depends on. It also means accepting that the spec-writing skill — being precise about intent without overspecifying implementation — is becoming a core engineering competency, not a product-management one.

The teams that will pull ahead aren't the ones with the best agents. The agents are roughly fungible and getting more so. The teams that pull ahead are the ones who've done the unglamorous work of deciding what their specs own, what their code owns, and how the two stay honest with each other. That's the discipline spec-kit and its neighbours are pointing at. It's worth taking seriously now, while the conventions are still being set. Where this all converges — specs as the primary source, the IDE receding behind them — is the subject of from IDEs to spec compilers.