Agent Skills Are the New npm Packages

The Fastest-Growing Category on GitHub

Something unusual happened on GitHub this week. The trending repositories page — normally a mix of developer tools, frameworks, and side projects — was almost entirely AI agent tooling. Not AI models. Not AI wrappers. Agent skills, subagent collections, meta-prompting systems, and development setups designed to make AI coding agents more capable.

The numbers are striking. Garry Tan, the president and CEO of Y Combinator, open-sourced his Claude Code configuration as a repository called gstack. It hit 27,000 stars in roughly ten days. It's a set of fifteen specialist roles — CEO, designer, engineering manager, code reviewer, QA lead, release engineer — implemented as slash commands in Markdown files. A curated list of Codex subagents passed 1,400 stars with 136 entries across ten categories. A collection of OpenClaw skills reached 40,000 stars. An "awesome" list of Claude Code subagents hit 14,000.

These aren't traditional open-source projects. Nobody is contributing code in the conventional sense. What's being shared are configurations, prompts, and behaviour definitions for AI coding agents. The closest analogy is the VS Code extensions marketplace or npm — a distribution layer for capabilities that plug into a shared platform.

What Agent Skills Actually Are

The term "skill" has emerged as the generic label for a reusable capability you can give an AI coding agent. The specifics vary by platform, but the pattern is consistent:

A skill is typically a Markdown or TOML file that defines a role, a set of instructions, constraints, and sometimes tool access for an AI agent. When you install a skill, you're giving your agent a new behaviour it can invoke — a code reviewer that checks for security issues, a QA process that opens a real browser and clicks through your app, or an autonomous iteration loop that makes changes, tests them, and keeps or reverts based on a metric.

Claude Code uses Markdown files in a .claude/skills/ directory. Each file is a prompt with metadata — when to trigger, what tools to use, what constraints to follow. Codex uses .toml files in a .codex/agents/ directory with structured fields for the model, system prompt, and sandboxing configuration. OpenClaw has its own format. None of these are compatible with each other.

The autoresearch pattern, popularised by Andrej Karpathy and now implemented as a Claude Code skill with 1,500 stars, illustrates what a well-designed skill looks like. It turns an AI agent into a relentless optimisation loop: define a goal, define a metric, then let the agent make one change per iteration, verify the result, keep improvements, revert failures, and repeat indefinitely. Every improvement compounds. Every failure auto-reverts. Git is the memory. The human sets the objective and goes to sleep.

Why This Matters

This is a pattern we've seen before. It happened with npm packages in the Node.js ecosystem around 2013. It happened with Docker images around 2015. It happened with VS Code extensions around 2017. In each case, a platform reached critical mass, a distribution primitive emerged, an ecosystem of shareable components exploded, and then the platform that attracted the most packages won.

The difference this time is the speed. npm took years to reach its first 100,000 packages. The agent skill ecosystem has gone from zero to tens of thousands of stars across multiple curated collections in weeks. Partly that's because skills are cheaper to create — many are single files rather than code libraries. Partly it's because the demand is intense: people are using AI coding agents for real work and discovering that the default behaviour isn't enough.

Garry Tan's gstack is a useful case study. He describes it as "a virtual engineering team you actually manage." The framing is deliberate — each skill represents a role in a software organisation, not just a prompt. The CEO skill rethinks the product before you write code. The engineering manager locks the architecture. The reviewer finds production bugs. The QA lead opens a real browser. This isn't prompt engineering in the traditional sense. It's organisational design, expressed as agent configuration.

The Fragmentation Problem

The obvious problem is that every platform has its own skill format, and none of them talk to each other.

Claude Code skills are Markdown files with a specific frontmatter schema. Codex subagents are TOML files with different fields. OpenClaw has yet another format. If you write a useful code review skill for Claude Code, it doesn't work in Codex. If someone builds a security auditor for Codex, you can't use it with Claude Code. The skills are conceptually identical — "tell an agent how to do a specific job" — but the implementations are incompatible.

This is the classic standards problem, and the history of developer tooling suggests it won't resolve cleanly. npm won because Node.js won, not because npm was technically superior to Bower or Component. Docker Hub won because Docker won. The agent skill format that wins will be the one attached to the platform that gets the most daily active developers.

Right now, that race is genuinely open. Claude Code, Codex, and OpenClaw each have substantial user bases and growing ecosystems. The star counts on skill collections suggest broad adoption across all three. Unlike the npm era, where the competing package managers were all targeting the same runtime, agent skills target fundamentally different platforms backed by different model providers. There's no obvious reason one should win.

What's Missing

The current state of agent skills has the energy of early npm — lots of packages, minimal quality control, and no real dependency management. A few gaps are worth noting:

No verification. When you install an npm package, you can at least read the source code and check for known vulnerabilities. When you install an agent skill, you're loading a prompt that will be fed to a model that can execute code on your machine. The attack surface is significant. A malicious skill could instruct the agent to exfiltrate code, install backdoors, or modify files in ways that aren't immediately visible. None of the current ecosystems have code signing, review processes, or automated security scanning for skills.

No composability standard. Skills can't reference other skills in a standardised way. There's no equivalent of package.json dependencies. If your QA skill assumes a specific project structure that your architecture skill sets up, that relationship exists only in your head.

No versioning. Most skills are distributed as files in a Git repository. There's no semantic versioning, no lock files, no way to pin a specific version of a skill across a team. When the skill author updates their prompt, your workflow changes without warning.

No metrics. Nobody is measuring whether skills actually improve outcomes. Garry Tan shares his commit statistics, but there's no standardised way to evaluate whether a code review skill catches more bugs, whether a QA skill reduces regressions, or whether a planning skill leads to better architecture. The ecosystem is growing on vibes, not evidence.

The Platform Bet

For developers and engineering teams, the practical question is whether to invest in building skills for a specific platform. The answer probably depends on your planning horizon.

In the short term, agent skills clearly work. People are getting measurable value from structured roles, automated review processes, and autonomous iteration loops. If you're already using Claude Code or Codex, building skills for your specific workflows is a worthwhile investment. The files are small, the iteration cycle is fast, and the worst case is that you've documented your engineering processes as structured prompts — which has value even if the platform changes.

In the medium term, the fragmentation is a real cost. Any skill you write today is locked to a single platform. If you're a team that might switch between AI coding tools — which is everyone, given how fast the landscape is moving — you're accumulating platform-specific technical debt with every skill you create.

The resolution will probably look familiar: either one platform wins decisively and its format becomes the standard, or a cross-platform skill format emerges that all platforms can consume. Anthropic's Model Context Protocol (MCP) is one candidate for part of this — it already standardises tool discovery and invocation across platforms. But MCP handles "what tools exist" rather than "how to use them well," and the latter is what skills encode.

What Comes Next

The agent skill ecosystem is moving too fast for confident predictions, but a few trends seem likely:

Skill marketplaces will emerge, probably sooner than most people expect. The pattern of curated "awesome" lists on GitHub is a precursor — it's the same function as an app store, just less sophisticated. Stripe's Machine Payments Protocol, which we wrote about yesterday, could eventually enable paid skills with per-use pricing.

Skill quality will become a differentiator. Right now the ecosystem rewards novelty. That will shift toward reliability as teams depend on skills for production workflows. The skill authors who invest in testing, documentation, and backward compatibility will build the packages that teams actually trust.

And the platform that makes skills easiest to create, share, discover, and compose will attract the most developers — the same dynamic that made npm, Docker Hub, and the VS Code Marketplace dominant in their respective ecosystems. The agent skill war is a platform war, and it's just getting started.