There is a quiet pattern emerging in codebases that lean heavily on AI assistance: the more boring the language, the better the output. Python, TypeScript, Go, Java, plain old C#. The stuff that fills Stack Overflow and a decade of GitHub history. Meanwhile, the more interesting languages — Gleam, Roc, Unison, even relatively established ones like Elixir or F# — produce noticeably weaker results from the same tools, the same prompts, the same engineers.
We think this matters more than most teams have admitted to themselves. Picking a stack used to be about ergonomics, hiring, runtime characteristics, and taste. Now there is a new axis: how much leverage does this language give an LLM? And the honest answer is that this axis dominates several of the others.
What the models actually know
LLMs are statistical artifacts of their training data. When a language has fifteen years of public code, hundreds of thousands of Stack Overflow answers, multiple generations of frameworks all documented in blog posts, and a long tail of bug reports with resolutions, the model has seen the shape of nearly every problem you are likely to encounter. It has seen the idiomatic solution, the wrong solution, the migration from the wrong solution to the right one, and the post-mortem.
For a language like Gleam — which we genuinely like — the model has seen perhaps a few thousand repositories, much of it written by the same handful of people, much of it experimental. Ask Claude or GPT-5 to write non-trivial Gleam and you get code that compiles maybe sixty percent of the time, uses outdated syntax from pre-1.0 versions, and invents stdlib functions that do not exist. The same prompt in TypeScript produces working code on the first try with reasonable test coverage.
This is not a small effect. In our internal measurements across client work, the gap between mainstream and niche languages in agent-completion rates is roughly 3-4x. A task that takes one agent loop in Python takes four in Elixir, and may never converge in something more exotic. When you multiply that across a codebase and a team, the cost is staggering.
The compounding effect on agentic workflows
Single-shot completions are forgiving. If the model writes slightly wrong code, a human catches it in review and fixes it. The cost is small.
Agentic workflows are not forgiving. When Claude Code or a similar agent is running a twenty-step task — reading files, writing changes, running tests, interpreting failures, retrying — every step has a probability of being correct. If the per-step success rate drops from 95% to 80% because the language is unfamiliar, the end-to-end success rate collapses from 36% to under 1% over twenty steps. The agent gets stuck, burns tokens, and produces a mess that takes longer to clean up than the original task would have taken to do by hand.
This is the central economic argument. Boring languages are not just slightly better for AI-assisted work. They are categorically different, because they sit on the right side of an exponential.
What "boring" actually means here
We are not arguing for Java 8 forever. The relevant properties are:
- Volume of public code, weighted toward recent versions
- Stability of idioms — the language hasn't reinvented its async model three times in five years
- Strong, predictable tooling — one package manager, one formatter, one test runner that the model knows by name
- Clear error messages that the model can parse and act on
- A type system the model can reason about, or no type system at all (the middle ground of gradual typing tends to confuse it)
By these criteria, modern TypeScript is boring. Go is extremely boring. Python is boring despite its sprawl, because the sprawl is well-documented. Rust is becoming boring, which is interesting — its training data has crossed some threshold in the last eighteen months where models now write competent Rust, where two years ago they did not.
Elixir is not boring in this sense, despite being fifteen years old, because the volume of code is small relative to the surface area. Kotlin is borderline. Swift is borderline and getting worse as Apple churns the language. Anything with "effects" in its type system is, for now, off the table.
The counter-argument and why it is weaker than it looks
The obvious objection: language choice should optimize for the problem domain and the team's expertise, not for what a model happens to know today. Models will improve. Niche languages will get better coverage. Choosing your stack based on current LLM capability is short-sighted.
We partially agree. But two things complicate this.
First, the gap is not closing as fast as people assume. Training data for niche languages grows slowly because the communities are small. Even as base models get more capable, the relative advantage of mainstream languages persists, because those languages also generate more training data each year. The flywheel is asymmetric.
Second, the productivity delta is already large enough that it dwarfs most domain-fit arguments. If your team ships three times faster in TypeScript because the agent loop actually closes, that compounds into more iterations, more learning, better products. The theoretical elegance of pattern matching in Gleam does not survive contact with a 4x slowdown in delivery.
There is a real exception: when correctness matters more than velocity. If you are writing a payments ledger or a flight control system, the guarantees of a stronger type system may be worth the AI friction. But this is a much narrower set of projects than the people advocating for novel languages tend to admit.
How this reshapes stack decisions
We have started treating language choice as an explicit leverage decision, not a taste decision. The question we ask now is: how many tokens does it cost to get a working change into this codebase? That number is dominated by the language and framework choice, not by engineer skill.
In practice this means:
- New backend services default to Go or TypeScript unless there is a specific reason otherwise
- Frontend is TypeScript with React, because the model knows this stack better than any human ever will
- Data work is Python, even when the team would personally prefer something else
- Infrastructure is Terraform and bash, not Pulumi in some clever language
None of this is exciting. That is the point. The excitement now lives at the product layer, where the leverage is multiplied by how cheaply we can ship.
The harder truth
The romantic view of programming languages — that we pick them for expressiveness, for joy, for fit to the problem — is becoming a luxury good. It still exists, and it still matters for the small number of systems where correctness is paramount or where the domain is genuinely unusual. But for the median codebase, language choice is now a question of how well you can collaborate with a non-human teammate that has read most of the internet.
That teammate has read a lot of Python. It has read very little Roc. Until that changes, the rational move is to meet it where it lives. We think more teams will arrive at this conclusion in the next twelve months, and the ones that resist it will spend a lot of money learning it the slow way.