Running MCP Servers in Production: Patterns and Pitfalls

Getting an MCP server running locally is straightforward. Getting it running reliably in production — with multiple users, proper security, and monitoring — takes more thought.

This guide covers the architecture patterns teams are using to run MCP servers in production, based on our research across the MCP ecosystem. We haven’t run all of these ourselves, but we’ve studied how teams are deploying them.

The Local-First Problem

Most MCP servers are designed for local use: they run on your machine, connect to your IDE or AI assistant, and access your local files or credentials. This works well for individual developers.

Production is different. You need:

Multiple users connecting to the same server or separate instances
Credential management without storing API keys on every machine
Monitoring to know when things break
Scaling when one instance isn’t enough
Security boundaries between users and between tools

The MCP specification doesn’t prescribe how to solve these problems. That’s left to implementers — which means teams are inventing their own patterns.

Common Architecture Patterns

Pattern 1: Gateway + Multiple MCP Servers

The most common production pattern puts a gateway in front of multiple MCP servers. The gateway handles:

Authentication — verifying who the user is
Authorization — deciding which tools they can access
Routing — sending requests to the right MCP server
Rate limiting — preventing abuse

Each MCP server behind the gateway handles one domain (database queries, file operations, API calls, etc.). The gateway presents a unified interface to clients.

When this works well: Organizations with multiple internal tools that want to expose them to AI agents through a single connection point.

Watch out for: The gateway becoming a bottleneck. If every tool call goes through one gateway process, you’ll need to make it highly available.

Pattern 2: Sidecar Deployment

In containerized environments, some teams run MCP servers as sidecars alongside their main application. Each application pod gets its own MCP server instance with access scoped to that application’s resources.

When this works well: Kubernetes environments where each service needs MCP access to its own data but shouldn’t touch other services.

Watch out for: Resource overhead. Each sidecar consumes memory and CPU. If you have hundreds of pods, that adds up.

Pattern 3: Serverless / On-Demand

Some MCP servers can run as serverless functions, spinning up per-request and shutting down after. This works for stateless operations — calling an API, running a query, transforming data.

When this works well: Infrequent tool use where you don’t want to keep a server running. Also good for burst traffic.

Watch out for: Cold start latency. MCP tool calls that take 3 seconds instead of 300ms because the function had to spin up will frustrate users. Also, stateful operations (maintaining a database connection pool, caching results) don’t fit this model.

Pattern 4: Shared Service

A single MCP server runs as a shared service that multiple clients connect to. This is the simplest production deployment — just one server to manage.

When this works well: Small teams, single-purpose tools, or internal tools where all users have similar access levels.

Watch out for: Tenant isolation. If User A’s tool call can see User B’s data, you have a security problem. Also, one misbehaving client can affect all others.

Credential Management

The hardest part of production MCP deployments is usually credential management. In development, your MCP server uses your local credentials — your API keys, your database password, your OAuth tokens. In production, you need something better.

Approaches That Work

Secret stores (Vault, AWS Secrets Manager, etc.): The MCP server fetches credentials from a central secret store at startup or per-request. This is the most robust approach but adds a dependency.

Environment injection: Credentials are injected as environment variables by the deployment system (Kubernetes secrets, Docker secrets, etc.). Simpler than a secret store but harder to rotate.

OAuth delegation: For user-facing MCP servers, have the user authenticate via OAuth. The MCP server acts on behalf of the user with their token. This avoids storing long-lived credentials entirely.

Approaches That Don’t

Hardcoded credentials in config files: This seems obvious, but many MCP server examples ship with placeholder API keys in config files. In production, someone will forget to change them — or commit real keys to git.

Shared service accounts with full access: If your MCP server connects to a database using an admin account, every tool call has admin privileges. Use the principle of least privilege.

Monitoring and Observability

An MCP server in production needs monitoring just like any other service. Here’s what to track:

Essential Metrics

Tool call latency — how long each tool takes to execute. Set alerts for anomalies.
Error rate — what percentage of tool calls fail. A sudden spike means something broke.
Active connections — how many clients are connected. Helps with capacity planning.
Resource usage — CPU, memory, and network for the MCP server process.

Useful Logs

Tool call logs — which tools are being called, by whom, with what parameters. Essential for debugging and security auditing.
Error details — full stack traces for failed tool calls, not just “tool call failed.”
Authentication events — who connected, when, and whether they were authorized.

What Teams Miss

Token and credential expiry monitoring. If your MCP server uses OAuth tokens or API keys with expiration dates, monitor those. An expired credential at 3 AM will page your oncall team.

Downstream dependency health. Your MCP server might be healthy, but if the API it calls is down, tool calls will fail. Monitor the things your MCP server depends on, not just the server itself.

Security in Production

Our MCP security guide covers security in depth. For production specifically, these are the critical points:

Input Validation

Every parameter from an AI model should be treated as untrusted input. Models can hallucinate parameters, and prompt injection attacks can manipulate tool calls. Validate and sanitize everything.

Blast Radius Limiting

If an MCP tool can delete data, limit what it can delete. If it can query a database, restrict which tables and operations are allowed. The question isn’t “will this tool be misused?” but “what’s the worst thing that can happen if it is?”

Audit Trails

Log every tool call with enough detail to reconstruct what happened. In production, you will eventually need to answer: “What did the AI agent do, and why?” Without audit trails, you can’t.

Common Pitfalls

1. Treating MCP Like a REST API

MCP connections are stateful. The client and server maintain a session. If you deploy MCP servers behind a load balancer that doesn’t support sticky sessions, connections will break mid-conversation.

2. No Graceful Degradation

If an MCP server goes down, what happens? Most AI clients will just report “tool unavailable.” Better: have the client handle the failure gracefully, perhaps falling back to a cached response or informing the user what went wrong.

3. Ignoring Tool Description Quality

In production, tool descriptions matter more than in development. A vague description like “query data” will lead to the AI model calling the tool at wrong times or with wrong parameters. Good descriptions reduce errors and support costs.

4. Skipping Load Testing

MCP tool calls aren’t just HTTP requests — they can trigger database queries, API calls, file operations, or computations. Load test your MCP servers with realistic tool call patterns, not just connection counts.

5. No Versioning Strategy

When you update an MCP server, existing clients might break if tools change. Have a versioning strategy — whether that’s running old and new versions side-by-side, using semantic versioning in tool names, or maintaining backwards compatibility.

Deployment Checklist

Before putting an MCP server in production, verify:

Authentication — clients must prove who they are
Authorization — each tool call is checked against permissions
Credential rotation — you can rotate secrets without downtime
Monitoring — you’ll know when something breaks
Logging — you can reconstruct what happened after an incident
Rate limiting — one client can’t overwhelm the server
Input validation — all tool parameters are validated
Graceful shutdown — active connections are drained on restart
Health checks — load balancers can detect when the server is unhealthy
Backup plan — you know what to do if the MCP server goes down entirely

Real-World Case Study: Pinterest

Pinterest’s MCP deployment is one of the most detailed public examples of MCP running at enterprise scale. Their engineering team published their architecture in March 2026, and the numbers are worth studying.

Scale

As of early 2025, Pinterest’s MCP infrastructure handles 66,000 tool invocations per month across 844 active users, saving roughly 7,000 engineering hours per month. Their Presto MCP server — which lets agents query analytics data directly instead of context-switching to dashboards — is consistently the highest-traffic server.

Architecture

Pinterest runs a central MCP registry that acts as the source of truth for approved servers and their connectivity metadata. Both a human-friendly UI and an API allow discovery, validation, and integration into internal AI clients and IDEs. Clients consult the registry to validate permissions and server status before calling tools, enforcing governance consistently across the organization.

This mirrors Pattern 1 (Gateway + Multiple MCP Servers) described above, but with a registry layer that adds discoverability and governance — something the broader MCP ecosystem is working toward through the planned MCP Registry specification (Q4 2026 roadmap).

Security Model

Pinterest uses a two-layer authorization model:

End-user JWTs control human-in-the-loop access
Service-only flows rely on mesh identities for automated agent operations

Servers implement fine-grained authorization decorators and business-group gating, restricting high-privilege operations to approved teams. This is a practical example of blast radius limiting — different teams get different tool permissions based on their role.

Key Takeaway

Pinterest’s deployment validates that MCP works at production scale when treated as infrastructure rather than a developer convenience. The registry pattern, two-layer auth model, and per-team gating are patterns other organizations can adopt. For details, see Pinterest’s engineering blog post.

Where Things Are Heading

The MCP ecosystem is maturing rapidly. Pinterest’s deployment shows that internal MCP infrastructure is already delivering measurable ROI at scale. As adoption grows, expect:

Managed MCP hosting — cloud providers offering MCP server hosting with built-in auth, monitoring, and scaling
Standard authentication — the MCP specification may eventually include authentication standards (it currently doesn’t prescribe one)
Better tooling — purpose-built monitoring, debugging, and testing tools for MCP servers

For now, production MCP deployment requires borrowing patterns from microservices, API gateways, and traditional service operations. The teams doing it well are the ones treating MCP servers as production services — not just development conveniences.