Why Sandi Metz's Duplication Essay Keeps Coming Back

Every few months, Sandi Metz's "The Wrong Abstraction" climbs back up Hacker News. The most recent resurgence — sometime around the wider adoption of agentic editors like Cursor, Claude Code, and Cline — is not nostalgia. The essay reads differently now because the failure mode it describes has become cheaper to produce and harder to spot.

Metz's argument, briefly: duplication is far cheaper than the wrong abstraction. When two pieces of code look similar but evolve for different reasons, hoisting them into a shared function or class creates a gravity well. Every future change must either fit the abstraction or fight it. The fight usually loses, and the abstraction accretes flags, optional parameters, and special cases until nobody can safely touch it.

We have watched LLM-driven editors reach for that gravity well almost reflexively. Two functions with similar shapes, even when their semantics diverge, get extracted into a processItem(item, options) helper within a single edit. The model has pattern-matched on syntax, not on the reasons the code exists. And because the diff looks clean — fewer lines, a shared helper, a tidy import — it sails through review.

What the models are actually optimising for

LLMs are trained on corpora where DRY is treated as a near-universal good. They have read thousands of refactoring tutorials, style guides, and Stack Overflow answers where the punchline is "extract a function." They have not read the post-mortems where that extraction had to be unwound eighteen months later because the billing path and the notification path were never really the same thing.

So when an agentic editor sees two blocks with three matching lines, the prior is strong: extract. The model is not weighing the cost of a future divergence because that cost is invisible at generation time. It is optimising for the local readability of the current diff.

This is the inverse of how a careful engineer reads similar code. The engineer asks: do these two things change together? Are they similar because they share a domain concept, or because two different domain concepts happen to involve a loop and a map lookup? The LLM cannot ask that question without being told to.

The shape of the problem in practice

A few patterns we now see often enough to name them:

Coincidental shape collapse. Two handlers — one for webhook ingestion, one for a CLI command — both validate input, call a service, and return a status. The model extracts a handle(input, service) wrapper. Six weeks later, the webhook needs retry semantics and idempotency keys; the CLI needs progress reporting. The wrapper grows a mode parameter, then a context object, then a callback. Nobody remembers why it exists.

Premature interface invention. Asked to add a second payment provider, the model invents a PaymentProvider interface with methods that happen to match the first provider's API. The second provider's quirks — partial captures, asynchronous settlement — get smuggled in as optional fields. The interface is now a description of provider one with provider two stapled on.

Shared types across bounded contexts. The model notices that User appears in the auth module and the billing module, and helpfully consolidates them into a shared type. Auth cares about credentials and sessions; billing cares about tax jurisdictions and payment methods. The merged type leaks both concerns everywhere either is used.

None of these are catastrophic on the day they land. They are catastrophic on the day, months later, when a small change requires touching the shared thing and the blast radius turns out to be the whole codebase.

A position we hold during review

We have settled on a few rules that have held up across client engagements.

Three, not two. We do not extract a shared abstraction from two examples. Two is a coincidence; three is the start of a pattern. If an agent proposes a helper from two call sites, we ask it to inline both and revisit when a third appears. This is straight from Metz, and it remains the single highest-leverage rule we apply.

Name the reason, not the shape. A shared function called processItem or handleRequest is a red flag. The name describes the shape of the code, not the reason it exists. If the agent cannot propose a name that refers to a domain concept — chargeInvoice, enqueueWebhookRetry — the abstraction probably should not exist yet.

Bounded context discipline. Types and helpers do not cross module boundaries without an explicit decision. We tell the agent which directories are allowed to share code and which are not. A User in auth/ and a User in billing/ are two different types until somebody writes a one-paragraph note explaining why they should be one.

Diff size as a signal, not a goal. Agents love to report that a refactor removed lines. We do not treat line count as a quality metric in review. A change that duplicates twenty lines but keeps two call sites independently evolvable is often the right answer. We say this out loud in PR comments, because otherwise the next agent run will re-extract.

Refactor only when changing. We do not let agents refactor code they are not otherwise touching. "While I was here, I noticed these two functions could share a helper" is the exact failure mode Metz describes. If the agent wants to extract, it must do so as part of a change with a concrete reason.

Guardrails that survive the agent

Review discipline only goes so far. A few things we bake into the repository itself:

CLAUDE.md / cursor rules files that state the duplication-over-abstraction position explicitly, with examples. Models follow these much more reliably than vague style guides. We include a short list of names that are banned without justification: Base, Common, Shared, Utils, Helper, Manager.
Module boundaries enforced by linters, not convention. eslint-plugin-boundaries or equivalent. If the agent tries to import across a forbidden boundary to enable an extraction, the build fails before the PR opens.
Architecture decision records for any shared abstraction. Trivially, the agent cannot create a shared module without an ADR. This is friction by design. It does not stop the agent; it forces a human moment.
Tests that pin behaviour, not implementation. When duplication is the right answer, the two copies need independent tests. We do not let agents write a single parameterised test that covers both, because the test itself becomes an abstraction that resists divergence.

Why this matters more now

The cost of writing code has dropped. The cost of changing code has not. If anything, agentic coding raises the cost of changing code that has been prematurely abstracted, because the agent will faithfully propagate the abstraction's assumptions into every new feature until somebody stops it.

Metz's essay was written for a world where a senior engineer might extract a bad abstraction once a quarter and live with the consequences. We are now in a world where an agent can produce a dozen of them in an afternoon, each one locally sensible, collectively a tax on every future change.

The answer is not to distrust the tools. The answer is to be specific, in writing and in review, about when abstraction is earned. Duplication is cheap. The wrong abstraction is not. That trade has not changed; only the rate at which we are asked to make it.