
What MCP Servers Actually Cost You in Tokens
MCP servers spend tokens before your agent does a single thing. Every tool a server exposes gets its definition loaded into the context window on every request — the GitHub MCP server alone is around 55,000 tokens of tool definitions before you type a word. Here's how I measure that “context tax,” what it actually costs per run, and the four controls I use to keep it from quietly eating my agent budget.
The Short Answer: You Pay to Describe Tools You Never Use
When an MCP server connects to your agent, the model has to be told what tools exist, what each one does, and what arguments it takes. That description is input tokens, and it rides along on every request in the conversation — whether or not the agent ever calls the tool. A connected-but-unused server isn't free. It's a standing charge you pay on every single turn.
I ran into this the honest way: I'd wired up a stack of MCP servers on an agent because I might need all of them, felt the responses get slower and pricier, and went looking for where the tokens went. Most of them were gone before my agent read the first instruction. The tools weren't the problem. The descriptions of the tools were.
How Many Tokens Are We Actually Talking About?
The median MCP server is small — under 2,000 tokens of tool definitions across the thousands that have been measured. But medians hide the pain, because a handful of popular servers are enormous, and those are exactly the ones people connect. Here's the shape of it:
| Setup | Approx. tools | Tokens before first message |
|---|---|---|
| Median single server | a few | ~1,900 |
| GitHub MCP | ~93 | ~55,000 |
| GitHub + Linear + Supabase | 100+ | ~24,000 (12% of a 200k window) |
| “Kitchen sink” multi-server agent | 200+ | 150,000+ |
Put a price on it. At a representative $3 per million input tokens, 90,000 tokens of schema overhead is about $0.27 per request. That sounds trivial until you run the agent on a loop. A background job that fires that agent a thousand times a day is spending roughly $270 a day describing tools it mostly never calls — around $8,000 a month of pure overhead. This is the exact kind of silent, structural cost I obsess over when I build autonomous systems, and it's a close cousin of the runaway-loop trap I wrote about in running AI agents on autopilot without torching your budget.

Fix #1: Measure the Tax Before You Argue About It
You can't cut what you don't measure. Before I touch anything, I look at how many tokens the connected servers add before the first user message — the standing overhead. Most agent runtimes will report input token counts per request; the schema overhead is the floor you never drop below no matter how short your prompt is.
The trick is to make this a number you watch, not a thing you discover on a bill. Add a new server, check the floor moved by X tokens, decide if the capability is worth X on every future request. That one habit turns “why is this agent so expensive?” into a decision you made on purpose.
Fix #2: Connect Fewer, Scoped Servers
The most common mistake I see — and made myself — is treating MCP servers like browser tabs: open them all, just in case. Don't. Give each agent only the servers its actual job needs. A support-triage agent does not need your cloud provider's 60-tool server hanging off it. A deploy agent does not need the CRM.
Scoping is the highest-leverage fix because it removes the overhead entirely instead of shrinking it. A server you don't connect costs zero tokens. When I split one bloated do-everything agent into two scoped ones, each one's per-request floor dropped hard, and both got a little sharper too — fewer tools means fewer chances for the model to reach for the wrong one. That's the same “give the model less to trip over” principle behind the gateway I described in my write-up on routing agents through a single gateway.
Fix #3: Prune the Tools You Never Call
Even a server you legitimately need often exposes far more tools than you use. The GitHub server ships around 93; a given agent might touch five. Every one of the other 88 still loads its definition into context on every request. If your runtime lets you allowlist or filter the tools a server exposes, use it — keep the five, drop the rest, and watch the floor fall.
I think of tool definitions the same way I think of dependencies: every one you add is a small standing cost you carry forever, so the ones you're not using are pure liability. This is the tokens-in-context version of the routing discipline I use to keep model spend down in how I cut my Claude API bill by 73%.
Fix #4: Defer Loading With Tool Search
The newest and, honestly, most satisfying fix is on-demand tool search. Instead of loading every tool definition up front, the runtime keeps only the tool names in context and defers the full schemas. When the agent decides it needs a capability, it searches for the matching tool and only then pulls that tool's full definition into context. You pay for the schema you use, at the moment you use it — not for the 200 you don't.
Why this is a big deal:
On Claude, tool search auto-enables once tool definitions cross about 10% of the context window, and teams running many servers have reported up to a 95% cut in startup token cost. It turns “I connected too many servers” from a permanent tax into a lookup you pay only when it's earned.
I lean on exactly this pattern in my own automation stack — the agent sees a short list of tool names and fetches a tool's real schema only when a task calls for it. It's the difference between carrying the whole toolbox up the ladder and reaching down for the one wrench when you need it.
When an MCP Server Is Worth the Tax (and When a CLI Wins)
None of this means MCP is bad. For a capability an agent uses constantly, the structure an MCP server gives you — typed arguments, discoverable tools, a clean contract — is absolutely worth its token cost. The overhead amortizes across heavy use.
But for something the agent touches once in a while, a plain command-line tool it shells out to is often the leaner choice. The agent already knows how to run a shell, so there's no big schema to preload — the standing cost is close to zero and you only pay when it actually runs the command. My rule of thumb: high-frequency, structured capability → MCP; occasional or simple capability → CLI. Decide it per capability, not as a religion. If you want the deeper version of this trade-off in a full build, it's the backbone of my production guide to building AI agents with the Claude API. And the official Model Context Protocol docs are the right place to see exactly what a server is putting into context.
The Short Version
- MCP tool definitions are input tokens — you pay for them on every request, even for tools you never call.
- The median server is small (~1,900 tokens), but the popular ones are huge: GitHub MCP is ~55,000 tokens on its own.
- Stack a few big servers and 150,000+ tokens of overhead per request is real — that's ~$270/day on an agent fired 1,000 times.
- Measure the standing overhead before the first message, so a new server can't inflate every request in secret.
- Connect fewer scoped servers, prune tools you never call, and defer schemas with on-demand tool search (up to ~95% less startup cost).
- MCP for high-frequency structured work; a CLI for occasional capabilities. Decide per capability.
Paying for Tools Your Agent Never Uses?
I build production AI agents on Claude, the Anthropic API, MCP, and n8n with the token overhead measured and controlled from day one — scoped servers, pruned tools, and deferred tool loading wired in. If your agent bill feels heavier than the work it does, let's find the tax and cut it.
Related Posts
AI Agents
Running AI Agents on Autopilot Loops Without Torching Your Budget
AI agents in autopilot loops can burn hundreds of dollars in a weekend. The four cost controls I wire into every loop — a hard token budget that aborts, model routing, state-based dedup, and a failure kill-switch — with real numbers from production.
AI Agents
OpenClaw: An AI Agent Gateway for Multi-Tenant Agency Stacks
Why I built OpenClaw, how the gateway routes between Claude, GPT, and Gemini, and the cost/latency lessons from production.
AI Agents
AI Agent Cost Optimization: How I Cut Claude API Bills by 73%
Real production strategies for cutting AI agent costs — caching, model routing, prompt compression, and the math behind every decision.