55,000 Tokens Before Claude Even Thinks

In partnership with

Build a LinkedIn Growth Routine That Actually Compounds

Taplio helps you grow followers with consistent posting, boost visibility with smart engagement, and iterate on what’s working with advanced analytics.

All in one place.

Try free for 7 days + $1 for your first month with code BEEHIIV1X1.

Try Taplio for free.

When it all clicks.

Why does business news feel like it’s written for people who already get it?

Morning Brew changes that.

It’s a free newsletter that breaks down what’s going on in business, finance, and tech — clearly, quickly, and with enough personality to keep things interesting. The result? You don’t just skim headlines. You actually understand what’s going on.

Try it yourself and join over 4 million professionals reading daily.

Check it out

55,000 Tokens Before Claude Even Thinks

ResearchAudio.io

55,000 Tokens Before Claude Even Thinks

Skills vs MCP: the confusion costing you context window, money, and accuracy

55K

Tokens burned by 5 MCP servers

~30

Tokens per Skill at startup

10K+

Public MCP servers

I watched a developer spend an entire afternoon building a custom MCP server so Claude would follow his team's code review checklist. JSON-RPC protocol, TypeScript SDK, auth config, a running server process. Forty-seven files committed to a repo.

He needed a Markdown file.

One SKILL.md with the checklist in it. Claude would have loaded it automatically the next time someone asked for a code review. No server. No protocol. No infrastructure.

This confusion is everywhere. And it is not just about wasted time. It burns real tokens, real money, and makes your model less accurate. Today I am going to explain exactly how both work under the hood so you never mix them up again. You will also walk away with templates you can copy today.

MCP gives the model access to your tools. Skills teach it how to use them. One is a USB-C port. The other is driver software.

TLDR (then keep reading for the depth)

MCP = connectivity. Connects Claude to external tools (GitHub, databases, Stripe). Running server process, JSON-RPC, each tool burns 200-500 tokens. 5 servers = 55,000 tokens before your prompt.

Skills = knowledge. A Markdown file that teaches Claude your procedures and rules. ~30 tokens at startup. Full instructions load only when relevant. No server needed.

Together = the real power. MCP gives reach. Skills give judgment. Claude with both is an expert who can actually touch your systems.

The action: Find the procedure you have explained to Claude more than 3 times. Turn it into a SKILL.md. Pair with 2-3 MCP servers.

That is the summary. Now let me show you exactly how each one works, with real code, real token math, and templates you can copy.

What a Skill Actually Is (With Real Code)

A Skill is a folder with a SKILL.md file. That is it. Markdown with optional YAML metadata. No SDK. No server. No build step.

Simon Willison (creator of Datasette) said it best: MCP is a full protocol specification with hosts, clients, servers, and three different transports. Skills are Markdown. They feel closer to the spirit of how LLMs actually work. Throw in some text and let the model figure it out.

The clever part is progressive disclosure. At startup, Claude loads only each Skill's name and description. About 30 tokens per Skill. If a Skill matches the current task, the full instructions load. If not, it stays dormant. You can have dozens of Skills installed without any context window cost.

But the structure of that file matters. Here is a real production Skill:

---
name: db-query-optimizer
description: Analyze PostgreSQL slow queries
  and suggest indexes following team conventions
---

# Database Query Optimizer

## When to use this skill
Trigger when user mentions slow queries,
index optimization, or pg_stat_statements.

## Step-by-step procedure
1. Query pg_stat_statements for queries
   over 100ms avg time
2. Check existing indexes on those tables
   using pg_indexes view
3. For partitioned tables, suggest partial
   indexes (team preference)
4. Never suggest indexes on columns with
   <1% selectivity
5. Output format: table name, query,
   current time, suggested index DDL

## Rules
- Always use CONCURRENTLY for production
- Tables in schema "pii" must never
  appear in any output
- Use team naming: idx_{table}_{cols}

Four parts, each with a specific job:

The YAML metadata (name + description) is loaded at startup for every installed Skill. This is all Claude reads initially: ~30 tokens per Skill. If you have 20 Skills installed, that is only ~600 tokens total. Compare that to a single MCP server which can consume 8,000 tokens.

The trigger conditions tell Claude when to activate the Skill. Without these, Claude loads the full Skill for irrelevant tasks and wastes context. Be specific: "slow queries, index optimization, pg_stat_statements" is better than "database stuff."

The procedure is where the real value lives. This is institutional knowledge: your team's exact workflow, encoded as numbered steps. "Query pg_stat_statements for queries over 100ms" is the kind of specificity that MCP cannot provide. MCP gives the ability to query. The Skill tells Claude what to query and how to interpret the results.

The guardrails prevent mistakes the model would not know to avoid. "Tables in schema 'pii' must never appear in any output" means Claude will not leak sensitive data in a report, even if the MCP tool returns it. This is safety encoded as plain text. MCP cannot do this.

And Skills are going cross-platform. The same SKILL.md file now works across Claude Code, Codex, Gemini CLI, and other tools through the Agent Skills standard (agentskills.io). Write once, use everywhere.

What MCP Actually Is (And What It Costs You)

The Model Context Protocol is an open standard (JSON-RPC) that connects AI models to external systems. Anthropic released it November 2024. OpenAI, Google, Microsoft, and AWS backed it. Now governed by the Linux Foundation with 97 million monthly SDK downloads and over 10,000 public servers.

An MCP server runs as a separate process on your machine and exposes tools the model can call. Here is what a real configuration looks like:

// .mcp.json
{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": [
        "@modelcontextprotocol/server-postgres",
        "postgresql://user:pass@localhost/mydb"
      ]
    },
    "github": {
      "command": "npx",
      "args": ["@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_TOKEN": "ghp_xxxx..."
      }
    }
  }
}

// Each server makes tools available to Claude:
// postgres → query, list_tables
// github → create_pr, list_issues, get_file...
// (~20 tools = ~15,000 tokens loaded at startup)

Here is the cost most people do not realize. Every MCP tool generates a JSON schema that loads into the context window: name, description, parameter types, required fields, enum values. A single complex tool (like GitHub's create_pull_request with its 15 parameters) can consume 400+ tokens.

GitHub's official MCP alone has ~20 tools. That is roughly 8,000 tokens at startup. Add PostgreSQL, Slack, Jira, and Stripe and you hit 55,000 tokens before Claude even starts thinking about your actual question.

OpenAI's own guidance: keep under 20 tools per agent. Accuracy degrades past 10. GPT-5.4's new Tool Search feature reduces overhead by ~85% through on-demand loading. But the underlying constraint stays: more tools means more noise in the context window, and the model starts picking the wrong tool for the job.

And here is the thing most people do not realize: MCP gives Claude the ability to query your database. It does not give Claude any knowledge about your schema, your conventions, your team's preferences, or which data is sensitive. Claude will run generic SQL, return generic results, and format them generically. That is the gap Skills fill.

MCP SERVER (GITHUB)

~8,000

tokens at startup

Running server, auth, network

SKILL (CODE REVIEW)

~30

tokens at startup

Just a Markdown file, loads on demand

The Example That Makes It Click Forever

You want Claude to find slow database queries and suggest indexes. Three versions of this, each one better:

With only MCP: Claude connects to PostgreSQL via MCP. Runs a generic SELECT * FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 20. Returns results. But does not know your team uses partial indexes. Does not know table X is partitioned and needs a different strategy. Does not know tables in the "pii" schema should never appear in output. Does not know your naming convention (idx_{table}_{cols}). Suggests CREATE INDEX without CONCURRENTLY, which locks your production table.

With only a Skill: Claude loads the db-query-optimizer Skill. Knows exactly what to query, how to analyze it, which index types to suggest, and which tables to exclude. But cannot connect to the database. All that knowledge with no way to act on it. Claude would write a beautiful plan and then ask you to run the queries manually.

With both: Claude loads the Skill (procedure + guardrails), then calls PostgreSQL MCP to query pg_stat_statements with the specific filters from the Skill. Checks existing indexes using pg_indexes. For partitioned tables, suggests partial indexes (per team preference). Formats results with your naming convention. Uses CONCURRENTLY. Excludes PII tables. That is the difference between a generic tool and an expert who can actually touch your database.

Under the Hood: What Happens Step by Step

You type: "Find our slowest queries and suggest indexes." Here is the exact sequence:

1. Skill metadata scan. Claude reads every installed Skill's name + description. About 30 tokens each. With 15 Skills, that is 450 tokens. Claude sees "db-query-optimizer: Analyze PostgreSQL slow queries and suggest indexes following team conventions." Match found.

2. Full Skill loads into context. The complete SKILL.md now enters the context window: the 5-step procedure, the guardrails about PII tables, the naming conventions, the CONCURRENTLY rule. Claude now has the methodology before it touches any data.

3. Model plans. Claude reads the Skill, plans the approach: "Step 1 says query pg_stat_statements with 100ms threshold. I need the PostgreSQL MCP tool for that." It identifies which MCP tools to call and in what order.

4. MCP tool calls execute. Claude calls the PostgreSQL MCP's "query" tool with the specific SQL from the Skill's procedure. Then calls it again to check existing indexes via pg_indexes. Raw data returns to the context window.

5. Analysis + output. Claude filters out PII tables (guardrail). For partitioned tables, suggests partial indexes (Step 3 of procedure). Names everything idx_{table}_{cols} (team convention). Wraps DDL in CONCURRENTLY (production rule). Outputs the exact format specified in the Skill.

Notice how the Skill shapes every decision, and MCP executes every action. Neither alone gets close to this result.

The 10-Second Decision Framework

Every time you want to extend your AI setup, ask one question: does the model need to reach something, or know something?

Reach? MCP. Know? Skill. Both? Use both.

"Claude needs to follow our PR review checklist"	Skill
"Claude needs to read my private GitHub repo"	MCP
"Make all newsletters follow our brand voice"	Skill
"Query our PostgreSQL database directly"	MCP
"Find slow queries using our team's optimization rules"	Both
"Auto-fix bugs: read repo, reproduce, patch, verify"	Both

3 Skills You Can Copy and Use Today

Each one takes 10 minutes. Drop them in your Skills directory and they work immediately.

1. PR Reviewer

---
name: pr-reviewer
description: Review PRs using team standards
---
# PR Review Checklist
1. Check for missing error handling
2. Verify all new functions have tests
3. Flag any hardcoded credentials
4. Check naming follows camelCase
5. Verify no console.log in prod code

## Output
Table: file, line, issue, severity

Auto-triggers on every code review. Same checklist, every time, without anyone remembering the rules.

2. Incident Responder

---
name: incident-responder
description: Triage production incidents
---
# Incident Response
1. Check service health dashboard
2. Identify affected services
3. Check recent deploys (last 2h)
4. Check dependency status
5. Classify: P1/P2/P3/P4

## Escalation
P1: page on-call immediately
P2: Slack #incidents, 30min window
P3: next business day

## Rules
- Never restart prod without approval
- Never share customer data in Slack

Pair with Datadog or PagerDuty MCP for live telemetry. Skill = the runbook. MCP = the live data.

3. Commit Message Enforcer

---
name: commit-messages
description: Conventional commit format
---
# Commit Messages
Format: type(scope): description
Types: feat, fix, refactor, docs, test, chore
Max 72 chars first line
Body: explain WHY, not WHAT

## Examples
feat(auth): add OAuth2 PKCE flow
fix(api): handle null response body

## Never allow
- "fix bug" (too vague)
- "update" (says nothing)
- "WIP" in main branch

Auto-triggers every commit. Consistent git history without anyone remembering the convention.

WHAT I'D BUILD THIS WEEK

Step 1: Think about the last procedure you explained to Claude more than three times. Your deploy process, your data analysis method, your writing style guide. Turn it into a SKILL.md. That is the highest-leverage 20 minutes you will spend this month.

Step 2: Limit MCP to 2-3 servers for your primary external tools. GitHub, your database, one domain tool. Each server you add costs thousands of tokens and reduces tool-picking accuracy.

Step 3: For every MCP server you use, ask: "Does Claude know how to use this well, or is it just running generic commands?" If generic, build a companion Skill. MCP + Skill together is always better than either alone.

Most teams over-invest in MCP (it feels like real engineering) and under-invest in Skills (it feels like writing documentation). But Skills are 50x cheaper in tokens, require zero infrastructure, and often have a bigger impact on output quality.

THE BIGGER PICTURE

Skills and MCP are two layers of a four-layer stack. Above them sits context engineering: the discipline of deciding what enters the context window, when, and in what order. Gartner declared 2026 "The Year of Context." Karpathy calls it the discipline that replaced prompt engineering.

Above that sits harness engineering: the full orchestration layer (tools + memory + retries + guardrails + feedback loops) that turns a model into a reliable system. OpenAI's team used harness engineering to build a million-line codebase with zero manually typed code.

Four concepts. One stack. I will cover the next two in the coming issues.

For now: what is the procedure you keep re-explaining to Claude? Hit reply. That is your first Skill.

Next issue: Context engineering. What it actually means, how it differs from prompt engineering, and the specific patterns that make AI systems 10x more reliable.

ResearchAudio.io

Sources: Anthropic Skills Docs · Simon Willison · Claude Help Center

55,000 Tokens Before Claude Even Thinks

Build a LinkedIn Growth Routine That Actually Compounds

When it all clicks.

Keep Reading

Quick Links

Stay Updated