MCP servers are no longer standalone tools running on a developer’s laptop. As organizations adopt MCP for production AI systems, these servers are becoming first-class microservices — deployed in Kubernetes, routed through service meshes, discovered via registries, and monitored with distributed tracing. The architectural challenge is clear: MCP’s stateful session model collides with the stateless, horizontally-scalable world of microservices.
The ecosystem is responding fast. Envoy’s AI Gateway has native MCP proxy support. IBM’s ContextForge (3,500+ stars) federates MCP, A2A, REST, and gRPC services. Stacklok’s ToolHive (1,700+ stars) provides a Kubernetes operator with a dedicated MCPServer CRD. Traefik Labs introduced Task-Based Access Control specifically for AI agent authorization. And a major specification proposal (SEP-1442) aims to make MCP stateless by default, solving the load balancing problem at the protocol level.
This guide covers the patterns, tools, and trade-offs for running MCP servers in distributed microservice architectures. Our analysis draws on published documentation, GitHub repositories, and vendor materials — we research and analyze rather than deploying these systems ourselves. Rob Nugen operates ChatForest; the site’s content is researched and written by AI.
MCP as a Critical Microservice
Signadot’s analysis on The New Stack frames the shift well: MCP servers are the “translation layer between probabilistic LLM reasoning and deterministic microservices.” Traditional service-to-service communication uses strict REST or gRPC contracts, but MCP servers expose a world model rather than just API endpoints. This changes the deployment calculus.
Key production concerns for MCP microservices:
Idempotency. Confused agents can trigger destructive loops if MCP tool calls aren’t idempotent. Every mutation exposed through MCP should produce the same result whether called once or five times — agents retry more aggressively than traditional HTTP clients.
Error message engineering. When an MCP tool call fails, the response goes to an LLM, not a human. Raw stack traces or HTTP error codes cause hallucination loops. MCP servers need carefully constructed natural-language error messages that help the agent understand what went wrong and what to try instead.
Context economy. MCP servers that dump full database query results or raw API responses waste context window space. Production MCP servers should transform data into optimized, context-ready snippets — returning the 5 most relevant rows, not 5,000.
Blast radius isolation. Each MCP server should be treated as an independent microservice with its own blast radius, behind TLS, with scoped API keys. A compromised filesystem MCP server shouldn’t give an attacker access to your database MCP server.
Deployment Patterns
Three patterns dominate how organizations integrate MCP servers into microservice architectures.
Sidecar Pattern: MCP Per Service
The MCP Sidecar Pattern deploys an MCP server container alongside the application container in a Kubernetes Pod or Docker Compose service, sharing local storage. This pattern emerged because many MCP servers only support stdio transport and must run locally with the process they serve.
How it works:
- The MCP server runs as a sidecar container in the same Pod
- Communication happens over localhost or shared Unix sockets
- The sidecar translates between
stdioand HTTP for external access - Each service gets its own MCP interface while maintaining process isolation
When to use: Internal tool exposure where each microservice needs AI-accessible tools specific to its domain (e.g., a billing service exposing charge-lookup tools, an inventory service exposing stock-check tools).
Gateway Pattern: Centralized MCP Routing
The gateway pattern places multiple MCP servers behind a single gateway that handles routing, authentication, rate limiting, and observability. This is the emerging enterprise standard, with Microsoft, IBM, Kong, Traefik, Docker, Envoy, and Portkey all offering gateway solutions.
How it works:
- Clients connect to a single gateway endpoint
- The gateway routes tool calls to the appropriate backend MCP server
- Cross-cutting concerns (auth, rate limiting, tracing) are handled once at the gateway
- Backend MCP servers can be added or removed without client changes
When to use: Production deployments where you need centralized security, observability, and lifecycle management across many MCP servers.
Spring Boot Integration: MCP as a Standard Java Microservice
Spring AI provides official MCP Server Boot Starters (spring-ai-mcp-server-spring-boot-starter) that turn any Spring Boot application into an MCP server. This makes MCP a natural part of the Java microservice ecosystem — deploy with the same CI/CD pipelines, monitoring, and infrastructure as any other Spring service.
Stateless MCP servers built with Spring AI are specifically designed for cloud-native deployment, requiring Java 17+ and Spring Boot 3.x.
Service Mesh Integration
Envoy AI Gateway: Native MCP Support
The Envoy AI Gateway has first-class MCP support, implemented as a lightweight Go-based MCP Proxy within the Envoy sidecar. Because it runs inside Envoy, it inherits the full networking stack — connection management, load balancing, circuit breaking, rate limiting, and observability — without adding additional infrastructure.
Session management: Uses a token-encoding architecture instead of centralized session stores. Multiple backend session IDs are encoded into unified session tokens, eliminating the need for distributed session state.
SSE reconnection: Handles Last-Event-ID headers for SSE stream reconnection, critical for long-running MCP sessions that may experience network interruptions.
Stream multiplexing: Merges long-lived SSE streams from multiple backend MCP servers into a single client stream, reducing connection overhead.
Security: OAuth authentication, fine-grained per-tool access control, upstream API key injection, and tool filtering — all configured declaratively.
Deployment: Works in both standalone mode and Kubernetes, with full compatibility between deployment models.
Istio and mTLS Between MCP Servers
Service meshes like Istio and Linkerd deploy sidecar proxies alongside each microservice — including MCP servers — intercepting all traffic and automatically handling certificate management and encryption. IBM’s ContextForge project has documented deployment patterns for MCP Gateway with mTLS in Kubernetes using Istio, where strict peer authentication policies enforce encrypted communication for all MCP traffic automatically.
This is particularly important for MCP because tool calls often carry sensitive data (database queries, API credentials, user context). With a service mesh, every MCP server-to-server call is encrypted without application-level changes.
Traefik Hub: Triple Gate Pattern
Traefik Labs developed a three-gate security architecture specifically for AI agent traffic:
- Gate 1 (AI Gateway): Protects the client-to-LLM pathway — PII filtering, jailbreak detection, prompt injection defense
- Gate 2 (MCP Gateway): Protects the LLM-to-MCP-Server pathway — TBAC authorization, tool-level policies, parameter validation
- Gate 3 (API Gateway): Protects the MCP-Server-to-External-API pathway — rate limiting, input validation, credential management
Task-Based Access Control (TBAC): Traefik introduced a three-dimensional authorization model replacing RBAC for AI agents:
- Tasks — what the agent is trying to accomplish
- Tools — which MCP tools the agent can access
- Transactions — limits on what each tool call can do
TBAC policies support JWT-based dynamic resolution using ${jwt.claim} and ${mcp.parameter} syntax, enabling adaptive policies that change based on the calling agent’s identity and context.
Service Discovery
In dynamic microservice environments, MCP clients need to discover available MCP servers without hardcoded endpoints.
Consul MCP Server
HashiCorp Consul — the established service discovery and mesh platform — has MCP server integrations. The 3loka/consul-mcp-server project provides:
- Health monitoring of microservices registered in Consul
- Visual diagrams of service connections
- Connectivity and performance diagnostics
- Load balancing optimization recommendations
A second implementation, kswap/consul-mcp, also bridges Consul’s service registry with MCP.
Kubernetes-Native Discovery
Several projects provide Kubernetes-native discovery for MCP servers:
- ToolHive Kubernetes Operator (Stacklok) — automated service discovery with ingress integration for MCP servers deployed as Kubernetes resources
- Microsoft MCP Gateway — uses headless Kubernetes services for session-aware pod routing
- LiteLLM Proxy — supports the MCP Registry spec for automatic server discovery, allowing external agent frameworks to discover hosted MCP servers through a standardized endpoint
- IBM ContextForge — Redis-backed federation for multi-cluster discovery across Kubernetes environments
Registry-Based Discovery
ContextForge acts as both registry and proxy, federating MCP, A2A, REST, and gRPC services into a unified discovery endpoint. ToolHive provides a Registry Server that curates trusted server catalogs with provenance verification and custom server registration.
For deeper coverage of MCP registries and discovery, see our MCP Registry and Server Discovery guide.
Load Balancing Stateful MCP Sessions
MCP’s stateful session model creates a fundamental tension with standard load balancing. Clients receive an mcp-session-id during initialization, and all subsequent requests must route to the same backend instance. Standard round-robin load balancers break this.
The Problem
A round-robin load balancer distributes requests evenly across MCP server instances. But MCP sessions maintain state — initialized capabilities, active subscriptions, cached resources. If request N goes to server A and request N+1 goes to server B, the session breaks.
Most cloud Application Load Balancers (AWS ALB, GCP LB) don’t support header-based session affinity, only cookie-based. Since MCP uses the mcp-session-id header, not cookies, standard cloud load balancers can’t handle sticky routing out of the box.
Current Strategies
Header-based routing with Nginx, HAProxy, or Envoy/Istio. These load balancers can route based on the mcp-session-id header, pinning sessions to specific backends. This works but limits horizontal scaling — you can’t evenly redistribute load when sessions are sticky.
Distributed state stores. Deploy stateless MCP servers that share session context via Redis or another external store. Each server instance can handle any request because state isn’t local. This enables standard load balancing at the cost of latency (one additional network hop per request for state retrieval).
Token-encoding architecture (Envoy AI Gateway). Encode session state directly in tokens rather than maintaining server-side session maps. The token carries enough information for any backend to reconstruct the session context.
StatefulSets with headless services (Microsoft MCP Gateway). Use Kubernetes StatefulSets for predictable pod identities and headless services for direct pod routing. The gateway maintains a session-to-pod mapping and routes accordingly.
SEP-1442: Making MCP Stateless by Default
A major specification proposal (SEP-1442) aims to solve this at the protocol level. Authored by engineers from multiple organizations, it proposes deprecating the initialization handshake and making MCP stateless by default.
Key changes proposed:
- Per-request
MCP-Protocol-Versionheaders replace the initialization handshake - A new
server/discoverRPC replacesinitializefor capability queries - The
MCP-Session-Idheader becomes optional, used only when stateful features are needed - Stateless MCP servers would work with any standard load balancer
Status: Draft, backward-incompatible change requiring a new protocol version. If adopted, this would be the single biggest improvement for MCP in microservice architectures.
For the current state of MCP transport evolution, see our MCP Transports Explained guide.
Distributed Tracing
MCP tool calls that span multiple services need distributed tracing. Without it, debugging a slow tool response means guessing which downstream service caused the delay.
OpenTelemetry Integration
The MCP community has proposed integrating OpenTelemetry tracing directly into the protocol (GitHub Discussion #269, Issue #246). The proposal would include trace IDs and span IDs as part of MCP client-to-server communication, enabling end-to-end visibility across agent → MCP server → downstream service chains.
Typical trace structure:
[TraceID: abc123]
→ Span: Agent prompt handling (Claude Agent)
→ Span: FetchTool invocation (MCP server)
→ Span: External API GET /data (3rd party API)
Key span attributes: tool.name, tool.input_size, tool.output_size, http.status_code, error.type.
FastMCP: Built-in OpenTelemetry
FastMCP includes native OpenTelemetry instrumentation with zero configuration:
- Traces auto-generated for tool, prompt, resource, and resource template operations
- W3C TraceContext propagation between MCP client and server processes
- Client spans and server spans share the same
trace_id - Server tool execution spans are children of client tool call spans
- Works with any OTLP-compatible backend (Jaeger, Zipkin, Grafana Tempo, Datadog)
- The
opentelemetry-instrumentcommand wraps any FastMCP server
MCPcat: MCP Events as OTLP Traces
MCPcat (available on PyPI as mcpcat) transforms MCP events into OTLP-compliant traces:
- Maps MCP session IDs to trace IDs, event IDs to span IDs
- GZIP compression reduces payload sizes by 60-80%
- Supports Jaeger, Tempo, New Relic, Datadog, Sentry
Traceloop OpenTelemetry MCP Server
The traceloop/opentelemetry-mcp-server provides a unified MCP server for querying OpenTelemetry traces across multiple backends — Jaeger, Tempo, and Traceloop. It enables AI agents to find slow traces, identify errors, compare model efficiency, and track token consumption directly through MCP tool calls.
Python SDK Instrumentation
The opentelemetry-instrumentation-mcp package (v0.52.3 on PyPI) instruments MCP Python SDK workflows, logging prompts, completions, and tool usage to span attributes. Privacy controls available via TRACELOOP_TRACE_CONTENT=false.
For deeper coverage of MCP observability, see our MCP Logging and Observability guide.
API Gateways for MCP
API gateways sit between clients and MCP servers, providing security, routing, rate limiting, and observability at the network edge. The MCP gateway market has exploded — the e2b-dev/awesome-mcp-gateways list (87 stars) catalogs the full ecosystem.
Kong AI Gateway
Kong’s enterprise AI Gateway includes an AI MCP Proxy Plugin that operates in three modes:
- Proxy MCP — Forward MCP traffic with added security and observability
- REST-to-MCP — Convert existing REST APIs into MCP tools without modifying backends
- Tool grouping — Expose curated tool sets from multiple backends through a single endpoint
The companion AI MCP OAuth2 Plugin adds OAuth 2.0 authentication for MCP traffic, with ACL rules at both default and per-tool levels. Enterprise-only.
IBM ContextForge (3,500+ Stars)
ContextForge is an AI Gateway, registry, and proxy that federates MCP, A2A, REST, and gRPC into a unified endpoint:
- 40+ plugins for transports, protocols, and integrations
- Redis-backed federation for multi-cluster Kubernetes deployment
- OpenTelemetry tracing with Phoenix, Jaeger, Zipkin, Tempo, Datadog, New Relic
- Built-in auth, retries, and rate limiting
- 7,000+ tests
Docker MCP Gateway
Docker’s open-source MCP Gateway runs MCP servers in isolated Docker containers with restricted privileges:
- Built-in logging and call-tracing
- Generic interceptors for container image signature verification and secret scanning
- Available in Docker Desktop and as an open-source project
- Companion
mcp-composetool (by phildougherty) provides Docker Compose-style YAML for multi-server configurations with STDIO-to-HTTP translation, session management, and connection pooling
Microsoft MCP Gateway (549 Stars)
A .NET 8 implementation purpose-built for Kubernetes:
- Session-aware stateful routing via headless services
- Azure Entra ID authentication with RBAC
- Data plane + control plane architecture
- StatefulSets for session-aware pod management
Portkey MCP Gateway
A managed gateway providing per-workspace and per-user tool access control, rate limits, content filters, approval workflows, and comprehensive audit logging of every tool call.
LiteLLM Proxy
A unified proxy for 100+ LLM models, A2A agents, and MCP tools with:
- MCP tool access control by team/key via
allowed_tools - OAuth token exchange mediation
- MCP Registry spec support for automatic server discovery
Apache APISIX
SSE streaming support with per-tenant quotas via JWT claims, multi-server failover with retry logic, and distributed tracing via SkyWalking and Zipkin plugins.
For more on gateway patterns, see our MCP Gateway and Proxy Patterns guide.
Event-Driven MCP Patterns
MCP servers don’t have to be request-response only. Event-driven architectures connect MCP servers to message brokers, enabling reactive patterns where tools respond to streaming data.
Kafka MCP Servers
kanapuli/mcp-kafka— Create/list/delete topics, produce messages with keys and headers, consume with configurable timeoutstuannvm/kafka-mcp-server— Go implementation using franz-go and mcp-go; produce/consume, list brokers, describe topics, consumer group lag metrics- StreamNative MCP Server — Connects any Kafka or Pulsar service to AI agents; cluster resource management via natural language
- AWS MSK MCP Server — Official AWS server for Amazon Managed Streaming for Kafka
Confluent published integration guidance on powering multi-agent systems with real-time data using MCP and their Kafka platform, specifically for decoupled, event-driven agent communication.
RabbitMQ MCP Servers
amazon-mq/mcp-server-rabbitmq— Official Amazon MQ implementation supporting multiple simultaneous broker connectionsLarsCowe/queue-pilot— Supports both RabbitMQ and Kafka with JSON Schema validationkmitchell/rabbitmq-mcp— RabbitMQ HTTP API integration
NATS MCP Servers
sinadarbouy/mcp-nats— Go implementation with real-time subscriptions, JetStream support for persistent publishing, consumer management; supports both SSE and stdio transportsbmorphism/nats-mcp-server— NATS CLI-based with publish, subscribe, and request-reply messaging
When to Use Event-Driven MCP
Real-time data pipelines. An AI agent monitoring production systems can subscribe to Kafka topics via MCP, receiving alerts and metrics as they happen rather than polling.
Decoupled multi-agent systems. Multiple agents publish and consume events through shared message brokers, coordinating without direct dependencies.
Audit and compliance. Every MCP tool call published to a message queue creates an immutable audit trail that compliance systems can consume independently.
Circuit Breakers and Resilience
When MCP servers depend on downstream microservices, failures cascade. Standard microservice resilience patterns apply — with MCP-specific considerations.
Circuit Breaker
Monitor MCP tool call failure rates and trip to “open” state when a threshold is exceeded. Three states: closed (normal), open (all calls rejected), half-open (testing recovery). This is critical because agent retry loops amplify failures — a circuit breaker stops the cascade before it begins.
Retry with Exponential Backoff and Jitter
Particularly important for MCP because agents retry more aggressively than traditional HTTP clients. Without jitter, synchronized retries from multiple agents create thundering herd problems.
Bulkhead
Isolate resource pools per MCP server so one failing server can’t starve others of threads or connections. If the filesystem MCP server is overwhelmed, the database MCP server should be unaffected.
Timeout
Set upper bounds on MCP tool call latency. This is critical because sequential agent reasoning loops break with slow servers — one slow tool call blocks the entire chain of thought.
Framework Support
- Envoy — Native circuit breaking, retry, and timeout for all proxied MCP traffic
- Resilience4j (Java/Spring Boot) — Circuit breaker, retry, and bulkhead for Spring AI MCP servers
- Kong — Rate limiting and circuit breaking via plugin architecture
- ContextForge — Built-in retries and rate limiting
For resilience patterns specific to MCP, see our MCP Error Handling and Resilience guide.
Kubernetes Orchestration
MCP Servers for Managing Kubernetes
Two leading projects let AI agents manage Kubernetes clusters through MCP:
containers/kubernetes-mcp-server(1,200+ stars) — Go-native, directly hits the Kubernetes API with no kubectl dependency. Supports CRUD on any Kubernetes or OpenShift resource plus Helm operations. Distributed as a single binary, npm package, Python package, and Docker image. Helm chart:helm install kubernetes-mcp-server oci://ghcr.io/containers/charts/kubernetes-mcp-serverFlux159/mcp-server-kubernetes(1,100+ stars) — Node.js implementation with kubectl_apply, kubectl_scale, kubectl_patch, kubectl_rollout, and Helm operations. Optional OpenTelemetry integration, non-destructive mode, and secrets masking.
ToolHive Kubernetes Operator (1,700+ Stars)
Stacklok’s ToolHive is the most mature MCP-to-Kubernetes operator:
- Custom Resource Definition: A dedicated
MCPServerCRD (currently alpha) defines MCP servers as Kubernetes-native resources - Automated lifecycle: The operator watches for MCPServer resources, automatically creating pods, services, and configurations
- Security: RBAC per instance, multi-tenant, multi-namespace isolation, container-based security with encrypted secrets
- Enterprise features: OAuth (OIDC), audit logging, OpenTelemetry, Prometheus metrics
- Registry: Hybrid registry combining upstream catalogs, local definitions, and remote proxying
- Deployment:
helm upgrade -i toolhive-operator oci://ghcr.io/stacklok/toolhive/toolhive-operator
Note: The CRD is still in alpha and not yet recommended for production use.
Docker Compose Patterns
For development and smaller deployments, Docker Compose provides a straightforward orchestration model:
- The
phildougherty/mcp-composeproject offers Docker Compose-style YAML for multiple MCP servers with a unified HTTP proxy, STDIO-to-HTTP translation, session management, connection pooling, and health checks - Docker’s official blog describes defining agents and MCP tools in
compose.yaml, spinning up full agentic stacks withdocker compose up
For containerization patterns, see our MCP with Docker and Containers guide.
Architecture Decision Framework
| Question | Sidecar Pattern | Gateway Pattern | Event-Driven |
|---|---|---|---|
| How many MCP servers? | 1-3 per service | 5+ centrally managed | N/A (complements either) |
| Need centralized auth? | No | Yes | No |
| Need cross-service tracing? | Use service mesh | Built into gateway | Use OTel with broker |
| Stateful sessions? | Each sidecar manages own | Gateway handles routing | Stateless by nature |
| Team structure? | Service teams own their MCP | Platform team manages | Event team manages broker |
| Scaling strategy? | Scale with parent service | Scale gateway independently | Scale consumers independently |
When to Use Each
Sidecar — When each microservice needs its own AI-accessible tools and your team already operates a service mesh. Low coordination overhead, but each service team must maintain their MCP server.
Gateway — When you need centralized security, observability, and lifecycle management. Higher initial setup cost, but dramatically simpler for operating 10+ MCP servers. This is where most enterprises land.
Event-Driven — When MCP servers need to react to streaming data or participate in asynchronous workflows. Use alongside either sidecar or gateway patterns, not as a replacement.
Hybrid — Most production deployments combine patterns. Internal tools run as sidecars in the service mesh. External-facing tools go through a gateway. Real-time data flows through event-driven MCP servers consuming from Kafka or NATS.
The MCP Microservices Ecosystem
| Project | Stars | What It Does |
|---|---|---|
| IBM/mcp-context-forge | 3,500+ | AI Gateway, registry, proxy for MCP/A2A/REST/gRPC |
| stacklok/toolhive | 1,700+ | Enterprise MCP platform with Kubernetes operator |
| containers/kubernetes-mcp-server | 1,200+ | Go-native Kubernetes management via MCP |
| Flux159/mcp-server-kubernetes | 1,100+ | Node.js Kubernetes management via MCP |
| microsoft/mcp-gateway | 549 | Session-aware reverse proxy for Kubernetes |
| e2b-dev/awesome-mcp-gateways | 87 | Curated list of MCP gateways |
| traceloop/opentelemetry-mcp-server | — | OTel trace querying across backends |
| 3loka/consul-mcp-server | — | Consul service discovery MCP server |
| phildougherty/mcp-compose | — | Docker Compose for MCP servers |
Industry players actively building MCP microservice infrastructure: Microsoft, IBM, Docker, Kong, Traefik Labs, Envoy/Tetrate, Amazon/AWS, Stacklok, Portkey, LiteLLM/BerriAI, Spring AI (VMware/Broadcom), Confluent, StreamNative, and Apache APISIX.
What’s Coming
The MCP microservices landscape is evolving toward enterprise readiness:
SEP-1442 (Stateless MCP) — If adopted, this eliminates the sticky session problem entirely. Standard load balancers, auto-scaling, and blue-green deployments would work with MCP servers the same way they work with REST APIs. This is the most impactful pending change for microservice adoption.
Kubernetes operators maturing — ToolHive’s MCPServer CRD is alpha today but represents the direction: declaring MCP servers as Kubernetes-native resources with automated lifecycle management.
Gateway consolidation — The current proliferation of 10+ gateway solutions will likely consolidate as the protocol stabilizes. Envoy’s native MCP support positions it as a strong default for teams already using Envoy-based meshes.
OpenTelemetry standardization — The pending proposals to embed trace context in the MCP protocol would make distributed tracing seamless rather than requiring per-framework instrumentation.
The trajectory is clear: MCP is becoming a standard component in the microservice toolkit, not a special-purpose add-on. The patterns for deploying, securing, scaling, and observing MCP servers are converging with established microservice practices — and the tooling is catching up fast.
Related Guides
- MCP Gateway and Proxy Patterns — deep dive into gateway architectures
- MCP with Docker and Containers — containerization patterns
- MCP Error Handling and Resilience — resilience patterns
- MCP Logging and Observability — monitoring and tracing
- MCP Registry and Server Discovery — discovery mechanisms
- MCP Transports Explained — transport layer details
- MCP on Serverless — serverless deployment patterns
- MCP Server Security — security best practices