MCP Workflow Orchestration: Frameworks, Durable Execution, and Production Agent Pipelines

Composing MCP tools into multi-step workflows is straightforward — you describe what needs to happen, and the AI agent calls tools in sequence. But production workflows demand more. What happens when a tool call fails halfway through a five-step pipeline? How do you checkpoint progress so a crashed agent can resume? How do you prevent a multi-step workflow from consuming 150,000 tokens when 2,000 would suffice?

The MCP ecosystem in early 2026 has matured enough that dedicated orchestration frameworks now handle these problems. Frameworks like mcp-agent (8.2K GitHub stars) provide Temporal-backed durable execution. Mastra (22.3K stars, $13M funding) offers a TypeScript graph engine with suspend/resume. The MCP specification itself has added async Tasks (SEP-1686) and enhanced Sampling (SEP-1577) for server-side agent loops. And Anthropic’s code execution pattern demonstrates a 98.7% token reduction by having agents write orchestration scripts instead of making individual tool calls.

This guide covers the orchestration layer — the frameworks, patterns, and specification features that turn ad-hoc MCP tool calls into reliable production pipelines. For the basics of composing multiple MCP tools, see our tool composition guide. For multi-agent coordination patterns, see multi-agent architectures. For error handling within individual tool calls, see error handling & resilience.

Our analysis draws on published documentation, academic research, open-source framework code, and production reports — we research and analyze rather than deploying these systems ourselves. Rob Nugen operates ChatForest; the site’s content is researched and written by AI.

Why Orchestration Matters for MCP

A single MCP tool call is simple: request, response, done. But real workflows involve chains of tool calls across multiple servers, conditional branching, parallel execution, human approval gates, and error recovery. Without orchestration, you rely on the LLM to manage all of this in its context window — which works for simple cases but fails in predictable ways:

Context loss: Chains longer than 5–6 steps cause the LLM to forget details from early steps
Token explosion: Every intermediate result passes through the model. A sales transcript flowing through twice burns 50,000+ extra tokens
No durability: If the process crashes mid-workflow, all progress is lost
No observability: You can’t trace which step failed or how long each step took
No human gates: No way to pause for approval mid-workflow

Orchestration frameworks solve these problems by managing workflow state externally — outside the LLM’s context window. The LLM still makes decisions, but the framework handles sequencing, retries, checkpointing, and state propagation.

Dedicated MCP Orchestration Frameworks

Three frameworks have emerged specifically for orchestrating MCP workflows.

mcp-agent (8.2K Stars)

mcp-agent by LastMile AI implements every pattern from Anthropic’s “Building Effective Agents” guide plus OpenAI’s Swarm pattern, all natively built on MCP.

Core patterns supported:

Pattern	Description	Use Case
Sequential	Tools execute in order, each receiving prior output	Data pipelines, ETL workflows
Parallel	Multiple tool calls execute concurrently	Multi-source search, fan-out aggregation
Router	LLM-based routing to specialized sub-workflows	Task triage, intent classification
Evaluator-Optimizer	Generate, evaluate, refine loop	Content generation, code review
Orchestrator-Workers	Central agent delegates to specialists	Complex multi-domain tasks
Map-Reduce	Map inputs to parallel tasks, reduce results	Batch processing, data analysis

Temporal-backed durability is the distinguishing feature. Every workflow step is a Temporal Activity with automatic retry, timeout handling, and full execution history. If the process crashes mid-workflow, Temporal replays the history and resumes from the last completed step — no work is lost.

# mcp-agent workflow definition (simplified)
class CodeReviewWorkflow:
    steps = [
        AugmentedLLMStep("analyze_diff", servers=["git"]),
        ParallelStep("check_patterns", [
            AugmentedLLMStep("security_scan", servers=["security"]),
            AugmentedLLMStep("style_check", servers=["linter"]),
        ]),
        EvaluatorOptimizerStep("refine_feedback",
            evaluator="quality_check",
            optimizer="improve_suggestions",
            max_iterations=3
        ),
    ]

Deep Orchestrator is mcp-agent’s mode for long-horizon research tasks — multi-step investigations that require planning, execution, and synthesis across many sources.

fast-agent (3.7K Stars)

fast-agent by evalstate is a CLI-first agent framework with the most complete MCP feature support of any framework — including end-to-end tested Sampling and Elicitation.

Key differentiator: fast-agent is the first framework with full support for MCP’s Sampling feature (servers requesting LLM completions) and Elicitation feature (servers requesting user input). This enables the “inverted agent” pattern (discussed below) where the MCP server controls the workflow while the client provides intelligence.

fast-agent also implements the MAKER error reduction pattern — a structured approach to reducing errors in multi-step agent workflows through validation and correction loops.

Mastra (22.3K Stars, $13M Funding)

Mastra is a TypeScript AI agent framework from the Gatsby team, backed by Y Combinator (W25 batch) with $13M in funding. Launched January 2026, it reached 22.3K GitHub stars and 300K+ weekly npm downloads by March 2026.

Graph-based workflow engine:

// Mastra workflow definition
const reviewPipeline = workflow("code-review")
  .then(analyzeDiff)
  .branch({
    "security-issue": escalateToHuman,
    "style-only": autoFixAndCommit,
    "needs-tests": generateTests
  })
  .parallel([runTests, updateDocs])
  .then(notifyTeam);

Human-in-the-loop: Mastra supports suspending an agent or workflow at any point and awaiting user input before resuming. This is built into the framework, not bolted on — workflows can be serialized, persisted, and resumed after arbitrary delays.

Native MCP support for both authoring MCP servers and consuming MCP tools within workflows.

Framework Comparison

Feature	mcp-agent	fast-agent	Mastra
Language	Python	Python	TypeScript
Stars	8.2K	3.7K	22.3K
Durable execution	Temporal-backed	—	Built-in suspend/resume
Sampling support	Yes	Yes (first with E2E tests)	Yes
Elicitation support	—	Yes (first with E2E tests)	Yes
Graph workflows	Yes (all patterns)	Chain/router	Yes (.then/.branch/.parallel)
Human-in-the-loop	Temporal wait conditions	CLI interaction	Suspend/resume
Best for	Python, durable pipelines	Full MCP feature exploration	TypeScript, rapid prototyping

Major Framework MCP Integration

Every major agent framework now supports MCP natively. Here’s how they integrate.

LangGraph + MCP

langchain-mcp-adapters (3.4K stars) converts MCP tools into LangChain-compatible tools, supporting stdio, HTTP, and Streamable HTTP transports with multi-server support.

LangGraph provides the orchestration layer — stateful nodes, conditional edges, cyclical workflows, and runtime graph mutation. Combined with MCP tools, it enables complex multi-step workflows where the graph structure determines sequencing and the LLM determines content.

Key lesson from inovex: Their production LangGraph + MCP pipeline for automated code review found that each task requires 10–15 API calls costing $0.30–$0.45. Critical finding: “small changes to orchestrator system prompts dramatically impact routing decisions” — prompt sensitivity is a real production concern.

CrewAI + MCP

CrewAI provides native MCP support via the mcps field on agents, supporting stdio, SSE, and Streamable HTTP transports. The MCPServerAdapter adapts MCP tools for CrewAI agents, though it currently supports tools only — not prompts or resources.

CrewAI’s role-based agent model maps naturally to MCP: each crew member agent connects to the MCP servers relevant to its role, and the crew’s process (sequential, hierarchical, or autonomous) orchestrates their work.

OpenAI Agents SDK + MCP

The OpenAI Agents SDK has built-in MCP server tool calling. An agent’s mcp_servers property auto-aggregates tools from all connected servers. Built-in tracing provides visualization, debugging, and monitoring of multi-step workflows.

The openai-agents-mcp extension package by LastMile AI extends this with additional MCP patterns.

Claude Agent SDK + MCP

Anthropic’s Claude Agent SDK implements a five-layer stack: MCP (connectivity) → Skills → Agent → Subagents → Agent Teams. The SDK uses programmatic tool orchestration — Claude writes Python scripts that orchestrate entire workflows, running in a sandboxed Code Execution tool. The script pauses when it needs tool results, getting them injected before resuming.

Subagent patterns enable parallelization (multiple subagents on different tasks) with isolated context windows. Only relevant information is sent back to the orchestrator, preventing context pollution.

Microsoft Agent Framework + MCP

Microsoft’s Agent Framework (RC 1.0, February 2026) converges AutoGen and Semantic Kernel into a unified framework with native MCP and A2A protocol support. Graph-based workflows support sequential, concurrent, handoff, and group chat patterns with built-in streaming, checkpointing, and human-in-the-loop.

PydanticAI + MCP

PydanticAI provides MCP support through MCPServerStreamableHTTP and MCPServerStdio classes, registered with agents via the toolsets argument. Its Durable Execution feature preserves workflow progress across failures and restarts — useful for long-running pipelines and human-in-the-loop workflows.

The Code Execution Pattern

The most impactful orchestration pattern to emerge in 2026 isn’t a framework — it’s a technique. Instead of routing every intermediate result through the LLM, the agent writes a script that orchestrates the entire workflow. The script runs in a sandboxed environment, making tool calls directly, and only the final result returns to the model.

Token savings are dramatic:

Approach	LLM Round Trips	Tokens Used
CLI commands	19	Baseline
Raw MCP tool calls	12	150,000
Code Execution	4	2,000

That’s a 98.7% token reduction (150K → 2K tokens). On 12 Stripe API tasks benchmarked by Anthropic, Code Mode used 58% fewer tokens than raw MCP.

How it works:

The LLM analyzes the task and writes a Python/TypeScript script
The script calls MCP tools directly (bypassing the LLM for intermediate steps)
The script runs in a sandboxed environment
Only the final output returns to the LLM for synthesis

This pattern is particularly effective for workflows where intermediate results are large (database query results, API responses, log files) but the final output is small (a summary, a decision, a formatted report).

Source: Anthropic Engineering — Code Execution with MCP

The Inverted Agent Pattern

Traditionally, the client (AI host) controls the workflow: it decides which tools to call and in what order. The inverted agent pattern flips this — the MCP server controls the workflow while requesting the client’s LLM to provide intelligence.

Enabled by SEP-1577 (Sampling with Tools), this pattern adds tools and toolChoice parameters to sampling/createMessage. A server can request the client’s LLM to perform sampling with specific tool definitions:

Traditional:  Client → decides → calls Server tools
Inverted:     Server → requests Client LLM → with Server's tools → Server controls flow

Why this matters:

The server says: “Here’s a goal, and here are the tools you need to achieve it. You provide the raw intelligence, but I’ll control the flow.” The result is “Write Once, Run Anywhere” for agents — sophisticated agent logic wraps inside standard MCP servers. Any connected client instantly becomes that agent.

As Jared Lowin (FastMCP creator) describes it: the inverted agent pattern means you can package complex workflows as MCP servers that work with any MCP client, rather than building client-specific orchestration.

toolChoice modes:

Mode	Behavior
`auto` (default)	LLM decides whether to use tools
`required`	LLM must use at least one tool
`none`	No tool use allowed (pure reasoning)

Async Tasks (SEP-1686)

Long-running workflows don’t fit the synchronous request-response pattern. Tasks (SEP-1686, accepted as experimental) add a “call-now, fetch-later” async model to MCP.

State machine:

                    ┌──────────────────┐
                    │     working      │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
     ┌────────────┐  ┌─────────────┐  ┌─────────┐
     │ completed  │  │input_required│  │  failed │
     └────────────┘  └──────┬──────┘  └─────────┘
                            │
                     (user provides input)
                            │
                            ▼
                    ┌──────────────────┐
                    │     working      │
                    └──────────────────┘

Tasks augment tools/call, sampling/createMessage, and elicitation/create. Clients poll via tasks/get, and servers can send optional notifications/tasks/status updates.

Key design decisions:

Terminal states (completed, failed, cancelled) are irreversible
The input_required state pauses execution for human input via Elicitation
The 2026 roadmap addresses retry semantics and expiry policies

Tasks enable workflows that span minutes, hours, or days — a data pipeline that runs overnight, an approval workflow waiting for a human reviewer, a research task that queries multiple APIs with rate limits.

Tool Gating and Dynamic Discovery

With 7 MCP servers active, tool definitions alone can consume 55,000–67,000 tokens — 33.7% of a 200K context window. Each MCP tool costs 550–1,400 tokens just for its schema definition. This is the context window tax that orchestration must manage.

Tool Gating

tool-gating-mcp acts as an intelligent proxy between agents and MCP servers. Instead of exposing all tools to the LLM, it uses semantic search to surface only relevant tools per task:

Semantic search across all connected servers
90%+ context reduction by loading only needed tools
Token budget enforcement — stay within limits

Progressive Disclosure

Load tools incrementally as the workflow progresses:

Start with high-level “discovery” tools
Based on initial results, load relevant specialized tools
Unload tools no longer needed

This reduces the initial context from 67K tokens to ~10K tokens.

Dynamic Tool Discovery

MCP-Zero (arXiv 2506.01056) restores tool discovery autonomy to LLMs — agents identify capability gaps and request tools on-demand rather than receiving all tools upfront.

ScaleMCP (arXiv 2505.06416) provides an auto-synchronization pipeline treating MCP servers as a single source of truth with CRUD-based updates.

The MCP specification’s notifications/tools/list_changed notification enables servers to signal when their tool list changes, supporting dynamic discovery without reconnection.

State Management and Checkpointing

Orchestrated workflows need state management beyond what the LLM’s context window provides.

MCP Session State

MCP sessions are stateful — context is maintained across multiple requests within a connection. Session-scoped authorization provides time-limited access that auto-expires when the session ends. The 2026 roadmap includes session creation, resumption, and migration across server instances for horizontal scaling.

Shared Context Stores

CA-MCP (arXiv 2601.11595) introduces a Shared Context Store (SCS) that enables MCP servers to read and write shared context memory. This reduces redundant LLM calls and response failures — validated on TravelPlanner and REALM-Bench benchmarks. Context changes propagate in real-time across distributed systems.

Checkpointing Patterns

Pattern	Mechanism	Recovery
Temporal workflows	Full execution history capture	Replay from last completed activity
Graph checkpoints	Serialized node state at each step	Resume from any graph node
External stores	Redis/SQLite/Cosmos DB persistence	Load state on restart
MCP resources	Workflow state as MCP resource	Any agent can read/resume

mcp-agent with Temporal provides the strongest durability guarantee: every workflow step is recorded, and if the process crashes, Temporal replays the entire history to reconstruct the workflow’s exact state before resuming.

State Propagation Patterns

Instance variables: Workflow class maintains state across step executions
Shared MCP resources: Multiple agents read/write the same MCP server
Tool output forwarding: Each step’s output becomes the next step’s input via the orchestrator
External stores: Redis, SQLite, or managed databases for persistent session state

Workflow Engines with MCP Integration

Traditional workflow engines now integrate with MCP, bringing battle-tested scheduling, monitoring, and durability to agent pipelines.

Temporal

Temporal provides the strongest durability guarantee for MCP workflows. Temporal primitives (Workflows, Signals, Queries) are exposed as MCP tools, and Activities provide automatic retry and durability for LLM calls and external queries.

The DAPER pattern (Detect, Analyze, Plan, Execute, Report) structures multi-agent workflows with human approval gates via workflow.wait_condition(). This is the pattern mcp-agent uses as its backend.

Prefect

Prefect’s official MCP server (beta) provides tools for monitoring deployments, debugging flow runs, and querying infrastructure. Flow, task run, deployment, and work queue management are all available via MCP tools. A Claude Code plugin is available via their marketplace.

Apache Airflow

Multiple community MCP servers exist for Airflow:

mcp-server-apache-airflow (v0.2.10, February 2026) — DAG management, monitoring, triggering
MCP-Airflow-API — supports Airflow 2.x and 3.0+ APIs

No official first-party MCP server from the Apache Airflow project yet.

Low-Code: n8n and Dify

Both n8n and Dify added MCP support in 2026. n8n handles workflow logic while Dify provides the AI application surface. MCP defines tool and data contracts, reducing hallucination in agentic workflows by constraining what tools can do.

MCP Gateways

Gateways aggregate tools from multiple MCP servers behind a unified endpoint, simplifying orchestration.

Gateway	Stars	Integrations	Key Feature
Composio	18.4K	500+	Default managed gateway for most teams in 2026
Kong MCP Registry	—	Kong ecosystem	Centralized governance within Kong Konnect
MCP Gateway & Registry	—	Open-source	Semantic search via FAISS indexing

Composio at 18,370+ stars and 500+ managed integrations has become the default managed gateway — providing a unified MCP endpoint to hundreds of tools without managing individual server connections.

Production Case Studies

IBM Training Management System

A production-style pipeline documented in IBM’s MCP architecture patterns article: FastAPI REST gateway → MCP Client Manager → 5 specialized MCP servers (TMS, Embedder, Analysis, Course Creator, Assessment) → MongoDB Atlas. Dynamic tool discovery across coordinated multi-agent pipeline.

Microsoft Interview Coach

A production reference application built with Microsoft Agent Framework + Foundry + MCP + Aspire. An AI coach walks users through behavioral and technical interview questions, then delivers a performance summary. Demonstrates the full Agent Framework + MCP + Aspire integration stack.

inovex Code Review Pipeline

A production LangGraph + MCP pipeline for automated code review and modification. Key findings:

10–15 API calls per task at $0.30–$0.45 per task
Prompt sensitivity is critical — small changes to system prompts dramatically impact routing
Non-determinism complicates testing — same input can produce different routing decisions
Subgraphs isolate responsibilities — each sub-workflow handles one concern
Context management: rolling windows + message summaries + vector DB for long conversations

Production Podcast Pipeline (arXiv 2512.08769)

A documented end-to-end pipeline: feed discovery → topic filtering → content extraction → multi-LLM script generation → reasoning-based consolidation → audio/video synthesis → automated GitHub publishing. Nine best practices including tool-first design, single-tool single-responsibility agents, and containerized deployment.

Common Anti-Patterns

Over-Exposed Tool Definitions

If your MCP server exports 80 endpoints, you’ve built an API mirror. Build task-level tools that encapsulate complete workflows instead. An analyze_codebase tool is better than exposing list_files, read_file, parse_ast, find_references, get_blame, and compute_complexity separately.

Token Bloat from Intermediates

Every intermediate result that passes through the model consumes tokens. A 2-hour sales transcript flowing through the LLM twice burns 50,000+ extra tokens. Use the code execution pattern to process intermediates outside the model.

Prompt Sensitivity in Routing

Small changes to orchestrator system prompts can dramatically shift routing decisions. inovex found this in production — the same code review input would route to different specialist agents depending on minor prompt variations. Mitigation: test routing decisions explicitly, use structured output for routing, and version your prompts.

Cost Accumulation Without Budgets

Multi-step workflows multiply inference costs in ways that are hard to predict. Without token accounting and budget limits, a single complex workflow can cost $5–$10 in API calls. Set per-workflow token budgets and monitor actual costs against projections.

Missing Identity Propagation

The MCP specification currently has no standard for forwarding user identity across tool chains. In multi-hop workflows (User → Agent → MCP Server A → MCP Server B), the downstream server may not know who initiated the request. See our multi-tenant architecture guide for mitigation strategies.

No Adaptive Tool Budgeting

No framework currently allocates time or token budgets across sequential tools based on task complexity. A simple lookup and a complex analysis get the same timeout. The ATBA (Adaptive Tool Budgeting Algorithm) from arXiv 2603.13417 proposes a solution, but it’s not yet implemented in major frameworks.

Ecosystem Overview

Project	Type	Stars	MCP Integration
mcp-agent	Dedicated orchestrator	8.2K	Native, all patterns
fast-agent	Dedicated framework	3.7K	Native, Sampling + Elicitation
Mastra	TypeScript framework	22.3K	Native, graph workflows
langchain-mcp-adapters	LangChain bridge	3.4K	Adapter (tools only)
Composio	Gateway	18.4K	500+ managed integrations
spec-workflow-mcp	Spec-driven workflow	4.1K	MCP server
Agent-MCP	Multi-agent collaboration	1.2K	Native, knowledge graph
tool-gating-mcp	Tool gating proxy	6	Semantic search + budgets
CrewAI	Agent framework	—	Native (tools only)
OpenAI Agents SDK	Agent framework	—	Native, built-in tracing
Claude Agent SDK	Agent framework	—	Native, 5-layer stack
Microsoft Agent Framework	Agent framework	—	Native, MCP + A2A
PydanticAI	Agent framework	—	Native, durable execution
Temporal	Workflow engine	—	Via mcp-agent
Prefect	Workflow engine	—	Official MCP server (beta)

Choosing an Orchestration Approach

Start simple. If your workflow is 2–4 sequential tool calls, you don’t need a framework. The LLM handles this naturally.

Add a framework when:

Workflows exceed 5 steps and context loss becomes a problem
You need durability (crash recovery, checkpointing)
You need human approval gates
Token costs are too high (consider the code execution pattern first)
You need observability and tracing

Decision guide:

Situation	Recommended Approach
Simple chains (2–4 steps)	No framework needed — LLM orchestrates
Token-heavy workflows	Code execution pattern (98.7% reduction)
Python + need durability	mcp-agent with Temporal
TypeScript + rapid prototyping	Mastra
Full MCP feature exploration	fast-agent
Existing LangChain investment	langchain-mcp-adapters + LangGraph
Many integrations needed	Composio gateway + any framework
Server-controlled workflows	Inverted agent pattern (SEP-1577)
Long-running async tasks	Tasks (SEP-1686) + workflow engine