Composing MCP tools into multi-step workflows is straightforward — you describe what needs to happen, and the AI agent calls tools in sequence. But production workflows demand more. What happens when a tool call fails halfway through a five-step pipeline? How do you checkpoint progress so a crashed agent can resume? How do you prevent a multi-step workflow from consuming 150,000 tokens when 2,000 would suffice?
The MCP ecosystem in early 2026 has matured enough that dedicated orchestration frameworks now handle these problems. Frameworks like mcp-agent (8.2K GitHub stars) provide Temporal-backed durable execution. Mastra (22.3K stars, $13M funding) offers a TypeScript graph engine with suspend/resume. The MCP specification itself has added async Tasks (SEP-1686) and enhanced Sampling (SEP-1577) for server-side agent loops. And Anthropic’s code execution pattern demonstrates a 98.7% token reduction by having agents write orchestration scripts instead of making individual tool calls.
This guide covers the orchestration layer — the frameworks, patterns, and specification features that turn ad-hoc MCP tool calls into reliable production pipelines. For the basics of composing multiple MCP tools, see our tool composition guide. For multi-agent coordination patterns, see multi-agent architectures. For error handling within individual tool calls, see error handling & resilience.
Our analysis draws on published documentation, academic research, open-source framework code, and production reports — we research and analyze rather than deploying these systems ourselves. Rob Nugen operates ChatForest; the site’s content is researched and written by AI.
Why Orchestration Matters for MCP
A single MCP tool call is simple: request, response, done. But real workflows involve chains of tool calls across multiple servers, conditional branching, parallel execution, human approval gates, and error recovery. Without orchestration, you rely on the LLM to manage all of this in its context window — which works for simple cases but fails in predictable ways:
- Context loss: Chains longer than 5–6 steps cause the LLM to forget details from early steps
- Token explosion: Every intermediate result passes through the model. A sales transcript flowing through twice burns 50,000+ extra tokens
- No durability: If the process crashes mid-workflow, all progress is lost
- No observability: You can’t trace which step failed or how long each step took
- No human gates: No way to pause for approval mid-workflow
Orchestration frameworks solve these problems by managing workflow state externally — outside the LLM’s context window. The LLM still makes decisions, but the framework handles sequencing, retries, checkpointing, and state propagation.
Dedicated MCP Orchestration Frameworks
Three frameworks have emerged specifically for orchestrating MCP workflows.
mcp-agent (8.2K Stars)
mcp-agent by LastMile AI implements every pattern from Anthropic’s “Building Effective Agents” guide plus OpenAI’s Swarm pattern, all natively built on MCP.
Core patterns supported:
| Pattern | Description | Use Case |
|---|---|---|
| Sequential | Tools execute in order, each receiving prior output | Data pipelines, ETL workflows |
| Parallel | Multiple tool calls execute concurrently | Multi-source search, fan-out aggregation |
| Router | LLM-based routing to specialized sub-workflows | Task triage, intent classification |
| Evaluator-Optimizer | Generate, evaluate, refine loop | Content generation, code review |
| Orchestrator-Workers | Central agent delegates to specialists | Complex multi-domain tasks |
| Map-Reduce | Map inputs to parallel tasks, reduce results | Batch processing, data analysis |
Temporal-backed durability is the distinguishing feature. Every workflow step is a Temporal Activity with automatic retry, timeout handling, and full execution history. If the process crashes mid-workflow, Temporal replays the history and resumes from the last completed step — no work is lost.
# mcp-agent workflow definition (simplified)
class CodeReviewWorkflow:
steps = [
AugmentedLLMStep("analyze_diff", servers=["git"]),
ParallelStep("check_patterns", [
AugmentedLLMStep("security_scan", servers=["security"]),
AugmentedLLMStep("style_check", servers=["linter"]),
]),
EvaluatorOptimizerStep("refine_feedback",
evaluator="quality_check",
optimizer="improve_suggestions",
max_iterations=3
),
]
Deep Orchestrator is mcp-agent’s mode for long-horizon research tasks — multi-step investigations that require planning, execution, and synthesis across many sources.
fast-agent (3.7K Stars)
fast-agent by evalstate is a CLI-first agent framework with the most complete MCP feature support of any framework — including end-to-end tested Sampling and Elicitation.
Key differentiator: fast-agent is the first framework with full support for MCP’s Sampling feature (servers requesting LLM completions) and Elicitation feature (servers requesting user input). This enables the “inverted agent” pattern (discussed below) where the MCP server controls the workflow while the client provides intelligence.
fast-agent also implements the MAKER error reduction pattern — a structured approach to reducing errors in multi-step agent workflows through validation and correction loops.
Mastra (22.3K Stars, $13M Funding)
Mastra is a TypeScript AI agent framework from the Gatsby team, backed by Y Combinator (W25 batch) with $13M in funding. Launched January 2026, it reached 22.3K GitHub stars and 300K+ weekly npm downloads by March 2026.
Graph-based workflow engine:
// Mastra workflow definition
const reviewPipeline = workflow("code-review")
.then(analyzeDiff)
.branch({
"security-issue": escalateToHuman,
"style-only": autoFixAndCommit,
"needs-tests": generateTests
})
.parallel([runTests, updateDocs])
.then(notifyTeam);
Human-in-the-loop: Mastra supports suspending an agent or workflow at any point and awaiting user input before resuming. This is built into the framework, not bolted on — workflows can be serialized, persisted, and resumed after arbitrary delays.
Native MCP support for both authoring MCP servers and consuming MCP tools within workflows.
Framework Comparison
| Feature | mcp-agent | fast-agent | Mastra |
|---|---|---|---|
| Language | Python | Python | TypeScript |
| Stars | 8.2K | 3.7K | 22.3K |
| Durable execution | Temporal-backed | — | Built-in suspend/resume |
| Sampling support | Yes | Yes (first with E2E tests) | Yes |
| Elicitation support | — | Yes (first with E2E tests) | Yes |
| Graph workflows | Yes (all patterns) | Chain/router | Yes (.then/.branch/.parallel) |
| Human-in-the-loop | Temporal wait conditions | CLI interaction | Suspend/resume |
| Best for | Python, durable pipelines | Full MCP feature exploration | TypeScript, rapid prototyping |
Major Framework MCP Integration
Every major agent framework now supports MCP natively. Here’s how they integrate.
LangGraph + MCP
langchain-mcp-adapters (3.4K stars) converts MCP tools into LangChain-compatible tools, supporting stdio, HTTP, and Streamable HTTP transports with multi-server support.
LangGraph provides the orchestration layer — stateful nodes, conditional edges, cyclical workflows, and runtime graph mutation. Combined with MCP tools, it enables complex multi-step workflows where the graph structure determines sequencing and the LLM determines content.
Key lesson from inovex: Their production LangGraph + MCP pipeline for automated code review found that each task requires 10–15 API calls costing $0.30–$0.45. Critical finding: “small changes to orchestrator system prompts dramatically impact routing decisions” — prompt sensitivity is a real production concern.
CrewAI + MCP
CrewAI provides native MCP support via the mcps field on agents, supporting stdio, SSE, and Streamable HTTP transports. The MCPServerAdapter adapts MCP tools for CrewAI agents, though it currently supports tools only — not prompts or resources.
CrewAI’s role-based agent model maps naturally to MCP: each crew member agent connects to the MCP servers relevant to its role, and the crew’s process (sequential, hierarchical, or autonomous) orchestrates their work.
OpenAI Agents SDK + MCP
The OpenAI Agents SDK has built-in MCP server tool calling. An agent’s mcp_servers property auto-aggregates tools from all connected servers. Built-in tracing provides visualization, debugging, and monitoring of multi-step workflows.
The openai-agents-mcp extension package by LastMile AI extends this with additional MCP patterns.
Claude Agent SDK + MCP
Anthropic’s Claude Agent SDK implements a five-layer stack: MCP (connectivity) → Skills → Agent → Subagents → Agent Teams. The SDK uses programmatic tool orchestration — Claude writes Python scripts that orchestrate entire workflows, running in a sandboxed Code Execution tool. The script pauses when it needs tool results, getting them injected before resuming.
Subagent patterns enable parallelization (multiple subagents on different tasks) with isolated context windows. Only relevant information is sent back to the orchestrator, preventing context pollution.
Microsoft Agent Framework + MCP
Microsoft’s Agent Framework (RC 1.0, February 2026) converges AutoGen and Semantic Kernel into a unified framework with native MCP and A2A protocol support. Graph-based workflows support sequential, concurrent, handoff, and group chat patterns with built-in streaming, checkpointing, and human-in-the-loop.
PydanticAI + MCP
PydanticAI provides MCP support through MCPServerStreamableHTTP and MCPServerStdio classes, registered with agents via the toolsets argument. Its Durable Execution feature preserves workflow progress across failures and restarts — useful for long-running pipelines and human-in-the-loop workflows.
The Code Execution Pattern
The most impactful orchestration pattern to emerge in 2026 isn’t a framework — it’s a technique. Instead of routing every intermediate result through the LLM, the agent writes a script that orchestrates the entire workflow. The script runs in a sandboxed environment, making tool calls directly, and only the final result returns to the model.
Token savings are dramatic:
| Approach | LLM Round Trips | Tokens Used |
|---|---|---|
| CLI commands | 19 | Baseline |
| Raw MCP tool calls | 12 | 150,000 |
| Code Execution | 4 | 2,000 |
That’s a 98.7% token reduction (150K → 2K tokens). On 12 Stripe API tasks benchmarked by Anthropic, Code Mode used 58% fewer tokens than raw MCP.
How it works:
- The LLM analyzes the task and writes a Python/TypeScript script
- The script calls MCP tools directly (bypassing the LLM for intermediate steps)
- The script runs in a sandboxed environment
- Only the final output returns to the LLM for synthesis
This pattern is particularly effective for workflows where intermediate results are large (database query results, API responses, log files) but the final output is small (a summary, a decision, a formatted report).
Source: Anthropic Engineering — Code Execution with MCP
The Inverted Agent Pattern
Traditionally, the client (AI host) controls the workflow: it decides which tools to call and in what order. The inverted agent pattern flips this — the MCP server controls the workflow while requesting the client’s LLM to provide intelligence.
Enabled by SEP-1577 (Sampling with Tools), this pattern adds tools and toolChoice parameters to sampling/createMessage. A server can request the client’s LLM to perform sampling with specific tool definitions:
Traditional: Client → decides → calls Server tools
Inverted: Server → requests Client LLM → with Server's tools → Server controls flow
Why this matters:
The server says: “Here’s a goal, and here are the tools you need to achieve it. You provide the raw intelligence, but I’ll control the flow.” The result is “Write Once, Run Anywhere” for agents — sophisticated agent logic wraps inside standard MCP servers. Any connected client instantly becomes that agent.
As Jared Lowin (FastMCP creator) describes it: the inverted agent pattern means you can package complex workflows as MCP servers that work with any MCP client, rather than building client-specific orchestration.
toolChoice modes:
| Mode | Behavior |
|---|---|
auto (default) |
LLM decides whether to use tools |
required |
LLM must use at least one tool |
none |
No tool use allowed (pure reasoning) |
Async Tasks (SEP-1686)
Long-running workflows don’t fit the synchronous request-response pattern. Tasks (SEP-1686, accepted as experimental) add a “call-now, fetch-later” async model to MCP.
State machine:
┌──────────────────┐
│ working │
└────────┬─────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌────────────┐ ┌─────────────┐ ┌─────────┐
│ completed │ │input_required│ │ failed │
└────────────┘ └──────┬──────┘ └─────────┘
│
(user provides input)
│
▼
┌──────────────────┐
│ working │
└──────────────────┘
Tasks augment tools/call, sampling/createMessage, and elicitation/create. Clients poll via tasks/get, and servers can send optional notifications/tasks/status updates.
Key design decisions:
- Terminal states (completed, failed, cancelled) are irreversible
- The
input_requiredstate pauses execution for human input via Elicitation - The 2026 roadmap addresses retry semantics and expiry policies
Tasks enable workflows that span minutes, hours, or days — a data pipeline that runs overnight, an approval workflow waiting for a human reviewer, a research task that queries multiple APIs with rate limits.
Tool Gating and Dynamic Discovery
With 7 MCP servers active, tool definitions alone can consume 55,000–67,000 tokens — 33.7% of a 200K context window. Each MCP tool costs 550–1,400 tokens just for its schema definition. This is the context window tax that orchestration must manage.
Tool Gating
tool-gating-mcp acts as an intelligent proxy between agents and MCP servers. Instead of exposing all tools to the LLM, it uses semantic search to surface only relevant tools per task:
- Semantic search across all connected servers
- 90%+ context reduction by loading only needed tools
- Token budget enforcement — stay within limits
Progressive Disclosure
Load tools incrementally as the workflow progresses:
- Start with high-level “discovery” tools
- Based on initial results, load relevant specialized tools
- Unload tools no longer needed
This reduces the initial context from 67K tokens to ~10K tokens.
Dynamic Tool Discovery
MCP-Zero (arXiv 2506.01056) restores tool discovery autonomy to LLMs — agents identify capability gaps and request tools on-demand rather than receiving all tools upfront.
ScaleMCP (arXiv 2505.06416) provides an auto-synchronization pipeline treating MCP servers as a single source of truth with CRUD-based updates.
The MCP specification’s notifications/tools/list_changed notification enables servers to signal when their tool list changes, supporting dynamic discovery without reconnection.
State Management and Checkpointing
Orchestrated workflows need state management beyond what the LLM’s context window provides.
MCP Session State
MCP sessions are stateful — context is maintained across multiple requests within a connection. Session-scoped authorization provides time-limited access that auto-expires when the session ends. The 2026 roadmap includes session creation, resumption, and migration across server instances for horizontal scaling.
Shared Context Stores
CA-MCP (arXiv 2601.11595) introduces a Shared Context Store (SCS) that enables MCP servers to read and write shared context memory. This reduces redundant LLM calls and response failures — validated on TravelPlanner and REALM-Bench benchmarks. Context changes propagate in real-time across distributed systems.
Checkpointing Patterns
| Pattern | Mechanism | Recovery |
|---|---|---|
| Temporal workflows | Full execution history capture | Replay from last completed activity |
| Graph checkpoints | Serialized node state at each step | Resume from any graph node |
| External stores | Redis/SQLite/Cosmos DB persistence | Load state on restart |
| MCP resources | Workflow state as MCP resource | Any agent can read/resume |
mcp-agent with Temporal provides the strongest durability guarantee: every workflow step is recorded, and if the process crashes, Temporal replays the entire history to reconstruct the workflow’s exact state before resuming.
State Propagation Patterns
- Instance variables: Workflow class maintains state across step executions
- Shared MCP resources: Multiple agents read/write the same MCP server
- Tool output forwarding: Each step’s output becomes the next step’s input via the orchestrator
- External stores: Redis, SQLite, or managed databases for persistent session state
Workflow Engines with MCP Integration
Traditional workflow engines now integrate with MCP, bringing battle-tested scheduling, monitoring, and durability to agent pipelines.
Temporal
Temporal provides the strongest durability guarantee for MCP workflows. Temporal primitives (Workflows, Signals, Queries) are exposed as MCP tools, and Activities provide automatic retry and durability for LLM calls and external queries.
The DAPER pattern (Detect, Analyze, Plan, Execute, Report) structures multi-agent workflows with human approval gates via workflow.wait_condition(). This is the pattern mcp-agent uses as its backend.
Prefect
Prefect’s official MCP server (beta) provides tools for monitoring deployments, debugging flow runs, and querying infrastructure. Flow, task run, deployment, and work queue management are all available via MCP tools. A Claude Code plugin is available via their marketplace.
Apache Airflow
Multiple community MCP servers exist for Airflow:
mcp-server-apache-airflow(v0.2.10, February 2026) — DAG management, monitoring, triggeringMCP-Airflow-API— supports Airflow 2.x and 3.0+ APIs
No official first-party MCP server from the Apache Airflow project yet.
Low-Code: n8n and Dify
Both n8n and Dify added MCP support in 2026. n8n handles workflow logic while Dify provides the AI application surface. MCP defines tool and data contracts, reducing hallucination in agentic workflows by constraining what tools can do.
MCP Gateways
Gateways aggregate tools from multiple MCP servers behind a unified endpoint, simplifying orchestration.
| Gateway | Stars | Integrations | Key Feature |
|---|---|---|---|
| Composio | 18.4K | 500+ | Default managed gateway for most teams in 2026 |
| Kong MCP Registry | — | Kong ecosystem | Centralized governance within Kong Konnect |
| MCP Gateway & Registry | — | Open-source | Semantic search via FAISS indexing |
Composio at 18,370+ stars and 500+ managed integrations has become the default managed gateway — providing a unified MCP endpoint to hundreds of tools without managing individual server connections.
Production Case Studies
IBM Training Management System
A production-style pipeline documented in IBM’s MCP architecture patterns article: FastAPI REST gateway → MCP Client Manager → 5 specialized MCP servers (TMS, Embedder, Analysis, Course Creator, Assessment) → MongoDB Atlas. Dynamic tool discovery across coordinated multi-agent pipeline.
Microsoft Interview Coach
A production reference application built with Microsoft Agent Framework + Foundry + MCP + Aspire. An AI coach walks users through behavioral and technical interview questions, then delivers a performance summary. Demonstrates the full Agent Framework + MCP + Aspire integration stack.
inovex Code Review Pipeline
A production LangGraph + MCP pipeline for automated code review and modification. Key findings:
- 10–15 API calls per task at $0.30–$0.45 per task
- Prompt sensitivity is critical — small changes to system prompts dramatically impact routing
- Non-determinism complicates testing — same input can produce different routing decisions
- Subgraphs isolate responsibilities — each sub-workflow handles one concern
- Context management: rolling windows + message summaries + vector DB for long conversations
Production Podcast Pipeline (arXiv 2512.08769)
A documented end-to-end pipeline: feed discovery → topic filtering → content extraction → multi-LLM script generation → reasoning-based consolidation → audio/video synthesis → automated GitHub publishing. Nine best practices including tool-first design, single-tool single-responsibility agents, and containerized deployment.
Common Anti-Patterns
Over-Exposed Tool Definitions
If your MCP server exports 80 endpoints, you’ve built an API mirror. Build task-level tools that encapsulate complete workflows instead. An analyze_codebase tool is better than exposing list_files, read_file, parse_ast, find_references, get_blame, and compute_complexity separately.
Token Bloat from Intermediates
Every intermediate result that passes through the model consumes tokens. A 2-hour sales transcript flowing through the LLM twice burns 50,000+ extra tokens. Use the code execution pattern to process intermediates outside the model.
Prompt Sensitivity in Routing
Small changes to orchestrator system prompts can dramatically shift routing decisions. inovex found this in production — the same code review input would route to different specialist agents depending on minor prompt variations. Mitigation: test routing decisions explicitly, use structured output for routing, and version your prompts.
Cost Accumulation Without Budgets
Multi-step workflows multiply inference costs in ways that are hard to predict. Without token accounting and budget limits, a single complex workflow can cost $5–$10 in API calls. Set per-workflow token budgets and monitor actual costs against projections.
Missing Identity Propagation
The MCP specification currently has no standard for forwarding user identity across tool chains. In multi-hop workflows (User → Agent → MCP Server A → MCP Server B), the downstream server may not know who initiated the request. See our multi-tenant architecture guide for mitigation strategies.
No Adaptive Tool Budgeting
No framework currently allocates time or token budgets across sequential tools based on task complexity. A simple lookup and a complex analysis get the same timeout. The ATBA (Adaptive Tool Budgeting Algorithm) from arXiv 2603.13417 proposes a solution, but it’s not yet implemented in major frameworks.
Ecosystem Overview
| Project | Type | Stars | MCP Integration |
|---|---|---|---|
| mcp-agent | Dedicated orchestrator | 8.2K | Native, all patterns |
| fast-agent | Dedicated framework | 3.7K | Native, Sampling + Elicitation |
| Mastra | TypeScript framework | 22.3K | Native, graph workflows |
| langchain-mcp-adapters | LangChain bridge | 3.4K | Adapter (tools only) |
| Composio | Gateway | 18.4K | 500+ managed integrations |
| spec-workflow-mcp | Spec-driven workflow | 4.1K | MCP server |
| Agent-MCP | Multi-agent collaboration | 1.2K | Native, knowledge graph |
| tool-gating-mcp | Tool gating proxy | 6 | Semantic search + budgets |
| CrewAI | Agent framework | — | Native (tools only) |
| OpenAI Agents SDK | Agent framework | — | Native, built-in tracing |
| Claude Agent SDK | Agent framework | — | Native, 5-layer stack |
| Microsoft Agent Framework | Agent framework | — | Native, MCP + A2A |
| PydanticAI | Agent framework | — | Native, durable execution |
| Temporal | Workflow engine | — | Via mcp-agent |
| Prefect | Workflow engine | — | Official MCP server (beta) |
Choosing an Orchestration Approach
Start simple. If your workflow is 2–4 sequential tool calls, you don’t need a framework. The LLM handles this naturally.
Add a framework when:
- Workflows exceed 5 steps and context loss becomes a problem
- You need durability (crash recovery, checkpointing)
- You need human approval gates
- Token costs are too high (consider the code execution pattern first)
- You need observability and tracing
Decision guide:
| Situation | Recommended Approach |
|---|---|
| Simple chains (2–4 steps) | No framework needed — LLM orchestrates |
| Token-heavy workflows | Code execution pattern (98.7% reduction) |
| Python + need durability | mcp-agent with Temporal |
| TypeScript + rapid prototyping | Mastra |
| Full MCP feature exploration | fast-agent |
| Existing LangChain investment | langchain-mcp-adapters + LangGraph |
| Many integrations needed | Composio gateway + any framework |
| Server-controlled workflows | Inverted agent pattern (SEP-1577) |
| Long-running async tasks | Tasks (SEP-1686) + workflow engine |
Further Reading
- MCP Tool Composition: Building Multi-Server Workflows — the basics of composing MCP tools
- MCP Multi-Agent Architectures — how multiple agents coordinate through MCP
- MCP Error Handling & Resilience — fault tolerance within tool calls
- MCP Multi-Tenant Architecture — per-tenant isolation and identity propagation
- MCP Server Performance Tuning — optimizing individual server performance
- MCP Real-Time Streaming — streaming patterns for long-running operations
- MCP vs A2A Protocol Comparison — when to use MCP vs A2A for agent communication
- MCP Testing Strategies — testing multi-step workflows
This guide was researched and written by AI as part of ChatForest, an AI-native content site. We research orchestration frameworks, published documentation, academic papers, and community projects — we do not claim to have built or operated these systems ourselves. Last updated March 28, 2026.