Skills as the Portable Packaging Format for Coding Agents

Something is converging in the agent ecosystem, and it became hard to ignore this week. Scrolling through the trending repositories — Anthropic's skills examples, several community skill registries, a handful of skill-aware orchestrators — the same primitive keeps surfacing under different names. Not tools. Not prompts. Not agents. Skills.

A skill, as the term is settling, is a self-contained bundle: a markdown instruction file, optional scripts, optional resources, and a manifest that tells an agent when to load it. It sits on disk. You can git clone it. You can audit it. You can hand it to a different model tomorrow and it still works. After a year of watching teams build bespoke tool-calling stacks that don't survive a vendor change, this is the first packaging format we've seen that feels genuinely portable.

Why skills, and why now

The agent space has cycled through several candidates for "the unit of capability." Function calling came first — vendor-specific JSON schemas, tightly coupled to a model's tokenizer quirks. Then MCP arrived and did real work standardising the transport layer between agents and tools. But MCP is a protocol, not a package. It tells you how an agent talks to a server; it doesn't tell you how to ship a reusable behaviour from one team's laptop to another's CI pipeline.

Skills fill that gap from the other direction. Where MCP standardises the wire, skills standardise the artifact. The two are complementary, and you can already see projects assuming both: a skill that, when activated, knows how to spin up or call into an MCP server. The skill is the thing humans curate and review. The MCP server is the thing the runtime executes.

The shape of a skill is almost suspiciously simple:

my-skill/
  SKILL.md          # instructions + frontmatter manifest
  scripts/
    validate.py
  resources/
    schema.json

The frontmatter declares a name, a description used for activation, and sometimes allowed tools or required permissions. The body is prose — written for the model, not for a parser. That last point matters more than it sounds. A skill is legible to the engineer who wrote it, the engineer reviewing it in a PR, and the model that will eventually load it. There is no IDL, no codegen, no SDK lock-in.

The portability argument

We've been burned enough times by vendor-shaped abstractions to be cautious here, but skills genuinely look portable in a way prior formats did not. The reasons are structural:

The format is filesystem-native. A skill is a directory. Version control, code review, diffing, signing, supply-chain scanning — all the tooling we already have for source code applies unchanged.
The activation mechanism is description-based. A model decides to load a skill by reading its description, not by matching a hardcoded ID. That means the same skill folder works across any model capable enough to follow instructions — Claude, GPT, open-weights, whatever ships next quarter.
Execution is delegated. Skills don't bundle a runtime. They reference tools (shell, Python, MCP servers) that the host environment provides. Move the skill to a new host with the same tools available, and it runs.

Contrast this with the alternatives. A custom GPT is not portable. An OpenAI Assistant is not portable. A LangChain agent class is portable in the trivial sense that Python is portable, but it carries the entire framework with it. A skill carries instructions and maybe a few scripts. The blast radius of switching vendors collapses from "rewrite the agent" to "point a different model at the same folder."

Governance becomes tractable

This is the part we find genuinely interesting, because it's the part that has been blocking serious enterprise agent adoption. When capability lives inside model weights or vendor dashboards, you cannot meaningfully review it. You cannot tell your security team what an agent can do. You cannot diff yesterday's agent against today's.

When capability lives in a skills/ directory in a repository, all of that becomes ordinary engineering:

Pull requests can require review for any change under skills/.
CODEOWNERS can route security-sensitive skills (anything touching credentials, production data, deployment) to specific reviewers.
CI can lint skills for forbidden patterns — shell-outs to unknown binaries, network calls to non-allowlisted hosts, prompt injection vectors in resource files.
Signed commits and artifact signing (Sigstore, cosign) extend naturally to skill bundles.
A skill registry — internal or public — becomes a package index, with all the supply-chain hygiene that implies.

We've started treating skills the same way we treat Terraform modules: reviewed, versioned, tagged, published to an internal registry, consumed by reference. The mental model transfers cleanly because the artifact behaves the same way.

The governance question that remains open is runtime enforcement. A skill says "this agent should only read from the staging database." The skill itself cannot enforce that — the host runtime has to. This is where MCP servers, sandboxing, and capability-scoped credentials do the actual work. Skills declare intent; the platform enforces it. Conflating the two is the easiest way to get this wrong.

Reuse, finally

The other thing skills unlock is the kind of reuse we've been promised since the first "agent framework" repo hit GitHub and never quite delivered. Because skills are small, prose-based, and execution-light, they compose well. A pdf-extraction skill from one team works inside another team's invoice-processing agent without modification. A postgres-migration-review skill works in any agent with shell access to psql.

What we've been watching this week is the early shape of a skill ecosystem: community repos collecting useful skills, organisations publishing internal skill libraries, and — inevitably — the first skill registries with discovery, search, and dependency resolution. The trajectory rhymes with npm circa 2011 or Docker Hub circa 2014. Not identical, but the same gravitational pull toward a shared format that nobody has to negotiate.

The risks rhyme too. Skill sprawl is going to be real. A repository with 200 skills, half of them undocumented variants of each other, is the obvious failure mode. So is the skill that quietly assumes a tool is available, fails silently when it isn't, and produces plausible-looking garbage. We expect the next year to produce skill linters, skill test harnesses, and skill observability tooling, mostly because those are the things we're already building for ourselves.

What we'd actually do with this

If a team asked us today how to position for this convergence, the answer is unromantic: start writing skills, and start treating the skills/ directory as a first-class part of the codebase. Not a prompts/ folder full of one-off experiments — a curated, reviewed, tested directory with the same engineering rigor as src/.

Keep skills small. One skill, one capability. Resist the urge to write a 4,000-word mega-skill that tries to encode an entire workflow; the model will either ignore most of it or get confused by it. Compose multiple narrow skills instead, and let the agent's planning loop sequence them.

Keep skills honest about their dependencies. If a skill needs ripgrep, say so in the frontmatter. If it expects an MCP server on a particular socket, document it. The portability story only holds if skills declare what they need.

And keep skills out of the model. The temptation will be to fine-tune capability into weights for a marginal latency or cost win. That's how you lose portability, governance, and reuse simultaneously. Weights are the worst possible packaging format — opaque, unversioned, vendor-specific. Skills exist precisely because that approach doesn't scale across teams or time.

The convergence we're seeing isn't a standard yet. There's no RFC, no foundation, no formal spec. But the shape is consistent enough across independent projects that we'd bet on it sticking. Skills are what tools should have been: artifacts you can read, review, share, and move. That's a low bar, and it took the industry longer to clear it than it should have. Now that we're here, the work is to build like it matters.