April 2026

Shipping One of the First Production MCP Servers

Auth design, rate limiting, and the decisions that shaped Schema App's MCP integration.

In late 2025, my team at Schema App shipped a production MCP (Model Context Protocol) server that lets enterprise customers expose their Content Knowledge Graph to AI assistants — ChatGPT, Copilot, Claude, and Gemini. As far as we could tell, we were one of the first companies to do this in the structured data space.

This is what we learned. Not a tutorial — more of a honest accounting of the decisions we made, the ones that worked, and the ones I'd revisit.

The problem we were solving

Schema App manages knowledge graphs for enterprise customers. These graphs contain structured data about products, organizations, locations, events — the kind of information that's useful to AI assistants trying to answer real questions. The gap was: customers had this rich structured data, but no way to let AI agents access it programmatically with proper access controls.

MCP was the right protocol for this. It gives AI assistants a standard way to discover and call tools, read resources, and interact with external systems. But in late 2025, the spec was still evolving. Documentation was sparse. There were maybe a handful of production implementations we could reference, none in our domain.

The auth problem

The first big decision was authentication. MCP doesn't prescribe an auth model — it's transport-agnostic. We had to decide: do we use OAuth 2.0 flows, API keys, or something custom?

We went with API keys scoped per customer, with a secondary permission layer that controlled which graph entities each key could access. The reasoning: our customers are enterprises with existing API key management workflows. OAuth would have been more “correct” but would have added weeks of integration work on their side for a protocol they were already skeptical about.

In retrospect, this was the right call for adoption speed. But it means we'll need to add OAuth support eventually — some customers are starting to ask for it as their AI integrations mature. The lesson: optimize for the adoption you need now, but design the abstraction so you can swap the auth layer later. We did the second part right, thankfully.

Rate limiting when you don't know the traffic pattern

With traditional APIs, you have months of traffic data to inform your rate limits. We had zero. Nobody — including us — knew what LLM-driven traffic to an MCP server would look like. Would agents make 10 requests per session or 10,000? Would they respect backoff headers?

We started conservative: 60 requests per minute per API key, with a token bucket that allowed short bursts. Then we instrumented everything and watched. It turned out most AI assistants were surprisingly well-behaved — they'd make a few tool-discovery calls, then a handful of resource reads. The pathological case was agents in loops that kept retrying failed queries with slightly different parameters. We added a similarity-based dedup layer that caught most of these.

The thing I'd do differently: we should have built the rate limiting as a separate service from day one instead of embedding it in the MCP server process. When we needed to adjust limits per-customer, it required a redeploy. That's a solved problem and we just didn't prioritize it early enough.

What surprised us

Three things caught us off guard:

Tool descriptions matter more than tool implementations. The quality of the natural-language descriptions we wrote for each MCP tool directly affected how well AI assistants used them. A poorly described tool would get called with wrong parameters or not called at all. We ended up spending almost as much time on tool descriptions as on the tool code itself.

Enterprise customers want audit logs, not dashboards. We built a usage dashboard. Nobody looked at it. What they wanted was a downloadable audit log showing exactly which AI agent accessed which data and when. Compliance teams need artifacts, not visualizations.

The spec will change under you. We shipped against an early version of MCP and had to update twice in three months as the protocol evolved. Building against a moving spec is uncomfortable, but waiting for stability would have meant missing the window entirely.

What I'd tell someone shipping their first MCP server

Start with one tool and one resource type. Get the auth and transport layer right first — everything else is iteration. Instrument from day one because you genuinely don't know what traffic will look like. Write your tool descriptions like you're explaining the API to a smart intern who's never seen your codebase. And don't wait for the spec to stabilize — by the time it does, your competitors will have shipped.

The MCP server is now one of Schema App's key differentiators. Customers who were skeptical about “another protocol” are now asking when we're adding more tools. That shift from skepticism to pull — that's when you know the bet paid off.

Have thoughts on this? I'd like to hear them: isser.akhil@gmail.com