What Spec-Driven Development Asks of Coding Agents

GitHub's spec-kit is interesting less for what it does than for what it asserts: that the durable artifact in agent-driven development is the specification, not the conversation that produced it and not the prompt template that shaped the agent. The chat transcript is exhaust. The skill or instruction file is scaffolding. The spec is the thing you version, review, and ship against.

That assertion is correct, in our view. It also has consequences that most teams adopting spec-kit haven't worked through yet.

The artifact hierarchy spec-kit implies

Spec-kit's workflow is roughly: write a /specify document describing what you want, run /plan to produce an implementation plan grounded in your stack, run /tasks to break the plan into discrete units, then /implement to execute. Each stage produces a markdown artifact in the repo. The agent reads forward from spec to plan to tasks; humans review at each gate.

The implicit hierarchy is: spec governs plan, plan governs tasks, tasks govern code. Code is now the least authoritative artifact in the chain — it's the compiled output of the spec, much like a binary is the compiled output of source. If you want to change behaviour, you change the spec and regenerate downward.

This is a real inversion. In conventional development, code is the source of truth and documentation drifts behind it. In spec-driven development, the spec is the source of truth and code is expected to drift behind it — which is fine, because you regenerate. But it only works if the spec is actually treated as primary. The moment a developer edits generated code without updating the spec, the chain breaks and you're back to documentation rot, except now your documentation is also your build input.

What this asks of review processes

Most teams have a code review process. Very few have a spec review process with comparable rigor. When the spec becomes the authoritative artifact, this asymmetry becomes a problem.

A spec review needs to catch things a code review never had to: ambiguity that the agent will resolve in unpredictable ways, missing edge cases that won't surface until implementation, assumptions about the stack that conflict with the plan stage, and — critically — scope. A spec that says "add user authentication" gives the agent enormous latitude. A spec that says "add email-and-password authentication using the existing users table, with bcrypt hashing, session cookies expiring after 14 days, and rate limiting at 5 attempts per 15 minutes per IP" gives it almost none. Both are valid specs. Only one is reviewable in any meaningful sense.

We think teams adopting spec-kit need to decide, explicitly, what a spec must contain to be mergeable. Acceptance criteria? Non-goals? Stack constraints? Failure modes? The spec-kit templates suggest a structure but don't enforce one, and the templates are easy to skip past when an engineer is in a hurry. The review process is where that discipline has to live.

Versioning specs is not the same as versioning code

Git handles code versioning well because code has clear semantics: a diff is a set of textual changes that produce a deterministic behavioural change when compiled. Specs don't have that property. A one-word change in a spec — "should" to "must", or "users" to "authenticated users" — can produce a sweeping change in the generated implementation. Conversely, a large rewrite of a spec might produce no functional change at all if the rewrite was purely editorial.

This means git log on a spec file tells you less than you think. The actual semantic version of a spec is the behaviour it produces when implemented, and that behaviour depends on the agent, the model version, the plan stage, and any human intervention during /tasks or /implement. Two engineers running the same /implement against the same spec on the same day can produce materially different code.

We suspect the working answer here is to version specs alongside their generated plans and tasks — keep all three artifacts in the repo, review them together, and treat the triple as the unit of change. Spec-kit already does this. The discipline is in not letting the plan and tasks become stale relative to the spec, which is exactly the kind of drift that's hard to enforce without tooling.

The authority question

Here is the question we think most teams haven't asked: who has authority to change the spec?

In a conventional codebase, anyone who can open a PR can propose a change to anything. Review gates control what merges. This works because the cost of a bad change is bounded by the code itself — you can revert a commit. But in a spec-driven workflow, a change to the spec is a change to the intended behaviour of the system. If the spec is authoritative, then changing the spec is changing the contract. That's a different kind of authority than changing an implementation detail.

Product teams have historically held this authority through product requirements documents, design docs, and the like. Engineering teams have held it through implementation choices that quietly defined what the system actually does. Spec-driven development collapses these into a single artifact, which means it forces a conversation about who gets to write specs, who gets to amend them, and what happens when product intent and engineering reality disagree at the spec layer.

We don't think there's one right answer. A small team might give every engineer full spec authority and rely on review. A larger team might separate "intent specs" (what the system should do, owned by product) from "technical specs" (how it should do it, owned by engineering) and have spec-kit consume both. What matters is that the question gets asked before the specs accumulate, because retrofitting authority boundaries onto an existing corpus of specs is painful.

What we'd actually do

If we were standing up spec-kit on a new project tomorrow, we'd commit to four things upfront:

A spec template with required sections. Goals, non-goals, acceptance criteria, stack constraints, failure modes. Specs missing any section don't merge. This is enforceable with a linter on the spec file itself.
A spec review checklist distinct from code review. Focused on ambiguity, scope, and testability rather than style or correctness. Reviewed by at least one person who didn't write the spec, ideally one with product context and one with engineering context.
Spec, plan, and tasks committed together. No merging a spec without its generated plan and task breakdown, so reviewers can see what the agent intends to do before it does it. The plan stage is often where ambiguity in the spec surfaces; reviewing it is cheaper than reviewing the resulting code.
An explicit authority model. Written down somewhere in the repo. Who can write specs, who can amend them, what triggers a spec change versus an implementation change. Not because the rules need to be rigid, but because the absence of rules defaults to whoever is loudest in standup.

Spec-kit is a good tool, and spec-driven development is, we think, the more honest way to work with capable coding agents. But the tool ships with a workflow, not a governance model. The governance model is the team's job, and it's worth doing before the specs become the product.