Skip to content

agents · architecture

Don't build agents. Build skills.

A working mental model for agent systems that don't collapse in production — where the agent is thin, intent-only orchestration and every capability is a testable skill.

Author
Lali Devamanthri
Published
Reading time
3 min read

Most agent projects I review have the same shape: a single LLM call with a fat, ever-growing system prompt, a pile of tools bolted on, and a team that's afraid to touch any of it. When something breaks, no one can tell you which layer broke. Observability is a wall of trace logs and vibes.

The mental model is wrong. You don't build an agent and then add capabilities to it. You build skills — small, testable units of real work — and the agent is what's left after you extract them.

Everything below is a four-layer version of that model. The agent holds intent. The skills hold implementation. They meet through the filesystem.

A vertical architecture diagram with four layers. At the top, the AI Engineer defines intent by writing SKILL.md files. Below, the Claude Agent reads those files and orchestrates calls. Flanking the agent, two MCP servers extend its reach — one for tools, one for APIs. Below the agent sits the Filesystem, the persistent layer where SKILL.md and outputs live. At the bottom is the Skills Layer: two active skills (read/write/search and parse/transform) and two composable skills (summarise/draft and analyse/report) — each a self-contained, testable unit.

Four layers of a skills-first agent system: AI Engineer instructs the Agent, which accesses MCP servers and the filesystem, which in turn hosts the executable Skills Layer.instructsAI EngineerL1ClaudeagentL2MCPtoolsMCPAPIsFilesystemSKILL.md · outputsL3Skillread · write · searchACTIVESkillparse · transformACTIVESkillsummarise · draftSkillanalyse · report

Four layers. One direction of intent.

L1 · The human in the loop

You are the AI engineer. You write SKILL.md files, configure MCP servers, and run evals. The thing you do not do is write business logic inside the agent — not in prompts, not in a giant system message, not in a tangle of conditional branches. Logic lives one layer down.

L2 · The orchestrator

Claude — or whichever model you're using — is the orchestrator. Its whole job is to read the available skills, decide which one to call, and sequence calls toward the user's goal. Zero business logic. Zero bespoke prompts for individual capabilities. Move a skill, add a skill, change a skill — the agent layer stays untouched.

Tools vs skills

MCP servers give the agent reach: filesystem access, APIs, search. But a tool is not a skill. Tools are primitives — verbs like read, fetch, query. Skills are compositions of tools with a contract: inputs, outputs, error modes, an eval. If you can't write a passing test for it, it's a tool, not a skill.

L3 · The persistent layer

The filesystem is not a detail. It's where SKILL.md files live, where outputs are written, where the context the agent reasons over is durable. Once you make this choice — "outputs are files, state is files" — a lot of the usual agent-framework machinery (memory stores, vector DBs, orchestration engines) becomes optional. The filesystem is the orchestration engine.

L4 · The skills layer

Each skill is a self-contained unit, defined in a SKILL.md with the prompt, the input contract, the expected output shape, and its eval cases. You ship a new capability by dropping in a new file. You debug a broken capability by running its eval. A skill that passes its eval in isolation will pass inside the agent.

The principle

Your agent's only job is to decide which skill to call — and when. Everything else — parsing, writing, summarising, analysing, API calls, database queries — is a skill. Thin agent, fat skill layer. The opposite of how most teams start.

Three consequences

Once you adopt this shape, three things fall out of it for free:

  • Skills hold implementation. Agents hold intent. Debugging is localised. A capability that's broken is a broken skill, and you know which one from the trace.
  • A broken skill is a scoped bug. A broken agent is a catastrophe. You want as much of the system as possible in the first category. The skills-first model maximises it.
  • New capability = new SKILL.md. No agent rewrites. No prompt surgery. No regressions in unrelated flows. This is the part that makes the system ship-able under audit.

Where this usually falls over

Two failure modes I see repeatedly:

  1. Skills that aren't skills. Teams call something a skill when it's really a tool with a wrapper prompt. If there's no eval, it's not a skill — it's a tool you haven't tested yet.
  2. Agents that aren't thin. The agent's prompt grows because someone adds "handle case X" logic there instead of in a new skill. Every addition there is technical debt. Push the special case down.

Build skills. The agent is what's left over.

End of article

Building something AI-shaped for healthcare or fintech?

I work with a small number of teams at a time on integration architecture, eval pipelines, and getting models into regulated production. If the system you're designing rhymes with the one above, let's talk.