How many tokens does an MCP server use?

It depends entirely on how many tools the server exposes and how verbose their schemas are. Across thousands of measured servers the median is under 2,000 tokens of tool definitions, but the heavy hitters dominate: the GitHub MCP server ships about 93 tools and roughly 55,000 tokens. Connect GitHub, Linear, and Supabase together and you are looking at around 24,000 tokens — more than a tenth of a 200k context window — loaded before the agent reads a single user message. The number you care about is not the median, it is the sum of the servers you actually have connected.

Why do MCP servers cost money even when I don't call their tools?

Because tool definitions are input tokens. When an MCP server connects, the model has to be told what tools exist, what they do, and what arguments they take, and that description is prepended to the context on every request in the conversation. You pay for those input tokens whether or not the agent ever invokes the tool. A connected-but-unused server is pure overhead — you are paying to describe a capability the agent never touches.

How do I reduce MCP token cost?

Connect fewer servers per agent and scope each one to the job it actually does. Prune tools you never call so their schemas stop loading. Use on-demand tool search, which defers loading tool definitions and only pulls a tool's full schema into context when the agent needs it — teams with many servers report up to a 95% cut in startup token cost from this alone. And measure the overhead per run so a new server can't quietly add tens of thousands of tokens without you noticing.

Is it cheaper to use a CLI instead of an MCP server?

Sometimes, yes. A command-line tool the agent shells out to costs almost nothing in standing context — the agent already knows how to run a shell, so there is no big schema to preload. An MCP server pays its token cost up front on every request. For a capability the agent uses constantly, the MCP server's structure is worth it; for something it touches once in a while, a CLI the agent calls on demand is often the cheaper, leaner choice. The right answer is per capability, not a blanket rule.

What MCP Servers Actually Cost You in Tokens (and How I Cut the Context Tax)

The Short Answer: You Pay to Describe Tools You Never Use

When an MCP server connects to your agent, the model has to be told what tools exist, what each one does, and what arguments it takes. That description is input tokens, and it rides along on every request in the conversation — whether or not the agent ever calls the tool. A connected-but-unused server isn't free. It's a standing charge you pay on every single turn.

I ran into this the honest way: I'd wired up a stack of MCP servers on an agent because I might need all of them, felt the responses get slower and pricier, and went looking for where the tokens went. Most of them were gone before my agent read the first instruction. The tools weren't the problem. The descriptions of the tools were.

How Many Tokens Are We Actually Talking About?

The median MCP server is small — under 2,000 tokens of tool definitions across the thousands that have been measured. But medians hide the pain, because a handful of popular servers are enormous, and those are exactly the ones people connect. Here's the shape of it:

Setup	Approx. tools	Tokens before first message
Median single server	a few	~1,900
GitHub MCP	~93	~55,000
GitHub + Linear + Supabase	100+	~24,000 (12% of a 200k window)
“Kitchen sink” multi-server agent	200+	150,000+

Put a price on it. At a representative $3 per million input tokens, 90,000 tokens of schema overhead is about $0.27 per request. That sounds trivial until you run the agent on a loop. A background job that fires that agent a thousand times a day is spending roughly $270 a day describing tools it mostly never calls — around $8,000 a month of pure overhead. This is the exact kind of silent, structural cost I obsess over when I build autonomous systems, and it's a close cousin of the runaway-loop trap I wrote about in running AI agents on autopilot without torching your budget.

Reviewing MCP server token overhead on a cost dashboard before trimming connected servers — The overhead you want to find before your invoice does: tokens spent before the first real instruction.

Fix #1: Measure the Tax Before You Argue About It

You can't cut what you don't measure. Before I touch anything, I look at how many tokens the connected servers add before the first user message — the standing overhead. Most agent runtimes will report input token counts per request; the schema overhead is the floor you never drop below no matter how short your prompt is.

The trick is to make this a number you watch, not a thing you discover on a bill. Add a new server, check the floor moved by X tokens, decide if the capability is worth X on every future request. That one habit turns “why is this agent so expensive?” into a decision you made on purpose.

Fix #2: Connect Fewer, Scoped Servers

The most common mistake I see — and made myself — is treating MCP servers like browser tabs: open them all, just in case. Don't. Give each agent only the servers its actual job needs. A support-triage agent does not need your cloud provider's 60-tool server hanging off it. A deploy agent does not need the CRM.

Scoping is the highest-leverage fix because it removes the overhead entirely instead of shrinking it. A server you don't connect costs zero tokens. When I split one bloated do-everything agent into two scoped ones, each one's per-request floor dropped hard, and both got a little sharper too — fewer tools means fewer chances for the model to reach for the wrong one. That's the same “give the model less to trip over” principle behind the gateway I described in my write-up on routing agents through a single gateway.

Fix #3: Prune the Tools You Never Call

Even a server you legitimately need often exposes far more tools than you use. The GitHub server ships around 93; a given agent might touch five. Every one of the other 88 still loads its definition into context on every request. If your runtime lets you allowlist or filter the tools a server exposes, use it — keep the five, drop the rest, and watch the floor fall.

I think of tool definitions the same way I think of dependencies: every one you add is a small standing cost you carry forever, so the ones you're not using are pure liability. This is the tokens-in-context version of the routing discipline I use to keep model spend down in how I cut my Claude API bill by 73%.

Fix #4: Defer Loading With Tool Search

The newest and, honestly, most satisfying fix is on-demand tool search. Instead of loading every tool definition up front, the runtime keeps only the tool names in context and defers the full schemas. When the agent decides it needs a capability, it searches for the matching tool and only then pulls that tool's full definition into context. You pay for the schema you use, at the moment you use it — not for the 200 you don't.

Why this is a big deal:

On Claude, tool search auto-enables once tool definitions cross about 10% of the context window, and teams running many servers have reported up to a 95% cut in startup token cost. It turns “I connected too many servers” from a permanent tax into a lookup you pay only when it's earned.

I lean on exactly this pattern in my own automation stack — the agent sees a short list of tool names and fetches a tool's real schema only when a task calls for it. It's the difference between carrying the whole toolbox up the ladder and reaching down for the one wrench when you need it.

When an MCP Server Is Worth the Tax (and When a CLI Wins)

None of this means MCP is bad. For a capability an agent uses constantly, the structure an MCP server gives you — typed arguments, discoverable tools, a clean contract — is absolutely worth its token cost. The overhead amortizes across heavy use.

But for something the agent touches once in a while, a plain command-line tool it shells out to is often the leaner choice. The agent already knows how to run a shell, so there's no big schema to preload — the standing cost is close to zero and you only pay when it actually runs the command. My rule of thumb: high-frequency, structured capability → MCP; occasional or simple capability → CLI. Decide it per capability, not as a religion. If you want the deeper version of this trade-off in a full build, it's the backbone of my production guide to building AI agents with the Claude API. And the official Model Context Protocol docs are the right place to see exactly what a server is putting into context.

The Short Version

MCP tool definitions are input tokens — you pay for them on every request, even for tools you never call.
The median server is small (~1,900 tokens), but the popular ones are huge: GitHub MCP is ~55,000 tokens on its own.
Stack a few big servers and 150,000+ tokens of overhead per request is real — that's ~$270/day on an agent fired 1,000 times.
Measure the standing overhead before the first message, so a new server can't inflate every request in secret.
Connect fewer scoped servers, prune tools you never call, and defer schemas with on-demand tool search (up to ~95% less startup cost).
MCP for high-frequency structured work; a CLI for occasional capabilities. Decide per capability.

Paying for Tools Your Agent Never Uses?

I build production AI agents on Claude, the Anthropic API, MCP, and n8n with the token overhead measured and controlled from day one — scoped servers, pruned tools, and deferred tool loading wired in. If your agent bill feels heavier than the work it does, let's find the tax and cut it.

Hire Carlos Read More Articles

Back to Blog

What MCP Servers Actually Cost You in Tokens