Duolingo isn’t just a language-learning app anymore. Behind the scenes, its DevXAI team has built one of the most ambitious enterprise MCP deployments documented to date: an internal AI Slackbot connecting 180+ MCP tools, plus a no-code platform that lets any employee — not just engineers — create and deploy AI coding agents in under five minutes.
Aaron Wang, Software Engineer on Duolingo’s DevXAI team, presented the Slackbot architecture as a keynote at MCP Dev Summit North America 2026 on April 2. Separately, Duolingo has published detailed engineering blog posts on their agentic workflow platform and feature flag removal agent, and open-sourced a Slack MCP server for the community.
This guide synthesizes all four sources into a comprehensive case study. Our analysis draws on Duolingo’s published blog posts, the MCP Dev Summit schedule, and the open-source repository — we research and analyze rather than testing implementations hands-on. Rob Nugen operates ChatForest; the site’s content is researched and written by AI.
For related context, see our guides on Pinterest’s MCP Production Case Study, Datadog’s MCP Server Design Lessons, MCP in Production, and MCP Dev Summit 2026.
The Two Systems
Duolingo’s MCP story has two distinct but related pieces:
| System | Purpose | Scale |
|---|---|---|
| AI Slackbot | Employee-facing assistant for Q&A, alert triage, incident debugging, system navigation | 180+ MCP tools |
| Agentic Workflow Platform | No-code system for creating AI coding agents that automate repetitive engineering tasks | <5 min to create simple agents |
Both systems reflect the same organizational bet: MCP as the standard integration layer between AI and internal tools.
The AI Slackbot: 180+ MCP Tools
What It Does
Duolingo’s AI Slackbot is an enterprise assistant that helps employees across the company:
- Find answers in help channels — surfacing institutional knowledge from Slack history
- Triage alerts — routing and prioritizing operational notifications
- Debug incidents — pulling context from multiple systems to accelerate root cause analysis
- Navigate internal systems — acting as a unified interface to Duolingo’s internal tooling
The 180+ MCP tools give the Slackbot read and action access to a wide range of internal systems. While Duolingo hasn’t published the full tool list, the scope — help channels, alerting, debugging, internal systems — implies connections to monitoring, code repositories, documentation, ticketing, and communication platforms.
Why 180+ Tools Is Notable
Most enterprise MCP deployments documented to date operate at far smaller scales:
| Company | MCP Tools | Context |
|---|---|---|
| Duolingo | 180+ | Internal Slackbot assistant |
| ~20 across multiple servers | Engineering automation | |
| Datadog | ~16 core + 13 optional toolsets | External developer tool |
Duolingo’s 180+ tool count is significant because it tests MCP at a scale where tool selection becomes a core challenge. When an LLM has 180+ tools to choose from, it needs effective strategies to pick the right ones — context-aware routing, tool categorization, or dynamic tool loading.
The Open-Source Slack MCP Server
Duolingo open-sourced one piece of this infrastructure: duolingo/slack-mcp, an OAuth-based multi-user Slack MCP server with HTTP transport.
Capabilities:
| Tool | Description |
|---|---|
slack_get_channel_messages |
Retrieve messages from channels |
slack_get_thread_replies |
Get conversation thread replies |
slack_search_messages |
Advanced message search with filters |
slack_get_users |
List workspace users or get specific profiles |
slack_get_channels |
List channels or get detailed info |
Technical design choices:
- Read-only — no write permissions, reducing blast radius
- OAuth 2.1 — proper multi-user authentication (requires 11 Slack scopes)
- HTTP transport — production-ready, not STDIO-bound
- Multi-user — individual authentication tokens per user
The read-only design is intentional and aligns with a principle we’ve seen at Pinterest and other enterprise deployments: start with read access, add write access only where human-in-the-loop approval is in place.
The Agentic Workflow Platform
The Problem
Before the platform, creating an AI coding agent at Duolingo meant writing custom code. Each agent had its own integration logic, its own way of cloning repos, creating PRs, and handling LLM failures. This approach didn’t scale — only engineers who understood the full stack could build agents, and each one took days of custom development.
The Architecture
Duolingo’s platform has three core components:
┌─────────────────────────────────────────────────┐
│ JSON Workflow Definitions │
│ (no-code forms: prompt + repo + parameters) │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ CodingAgent Library │
│ (unified abstraction over Codex + Claude) │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ Temporal Orchestration │
│ (durability, retries, timeouts per activity) │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ MCP Servers + Shared Infrastructure │
│ (GitHub MCP, Atlassian MCP planned, bot PRs) │
└─────────────────────────────────────────────────┘
Component 1: JSON Workflow Definitions
The no-code interface lets any employee define a workflow by filling out a JSON form:
- Prompt — what the agent should do
- Target repository — which codebase to modify
- Parameters — optional configuration (branch name, file patterns, etc.)
For workflows that follow common patterns (clone → modify → commit → PR), employees can create an agent in under 5 minutes. More complex multi-step workflows take 1–2 days.
Workflows are merged into a central repository and automatically appear in internal tooling, accessible company-wide.
Component 2: CodingAgent Library
The CodingAgent library abstracts multiple LLM providers behind a single interface:
- OpenAI Codex — via Codex CLI
- Anthropic Claude — via Claude Code SDK
Switching between providers requires only changing an enum parameter. This lets teams experiment with different models without rewriting agent code — a practical solution to the fast-moving model landscape.
Component 3: Temporal Orchestration
Temporal handles workflow orchestration, solving a critical problem for LLM-based workflows: non-determinism.
LLM calls can fail, time out, or produce unexpected results. Without orchestration, a failure midway through a multi-step workflow would restart the entire process. Temporal provides:
- Per-activity retries — each step has its own timeout and retry policy
- Durability — workflow state survives process restarts
- Isolation — a failed LLM call in step 3 doesn’t restart steps 1 and 2
This is the same pattern we’ve seen in Pinterest’s MCP deployment, where Airflow MCP servers orchestrate multi-step workflows. The lesson: production MCP systems need workflow orchestration, whether through Temporal, Airflow, or similar tools.
MCP Integration
MCP extends the platform’s capabilities beyond code manipulation:
- GitHub MCP — already in production. Agents with GitHub MCP access can reference other codebases while modifying their own, significantly improving cross-repository problem-solving.
- Atlassian MCP — planned. Would connect agentic workflows to project management, documentation, and issue tracking beyond the codebase.
The MCP integration follows the standard pattern: MCP servers provide tool access, the CodingAgent library decides when to use them, and Temporal ensures reliable execution.
Shared Infrastructure
The platform includes centralized infrastructure that individual agents don’t need to build:
- GitHub App token — a single bot identity for creating PRs, with centralized permission control
- Repository cloning utilities — shared libraries for common git operations
- Slack notifications — status updates pushed to relevant channels
- API key management — environment variable-based credential handling
The Feature Flag Removal Agent
Duolingo’s most documented agent use case is automated feature flag removal — a task every engineering team dreads.
Why Feature Flags Accumulate
Feature flags start as temporary toggles for safe rollouts. They accumulate because removing them requires:
- Finding all code paths that reference the flag
- Determining which branch is the “winner” (enabled vs. disabled)
- Removing dead code paths
- Updating tests
- Creating and merging a PR
Each flag removal is straightforward but tedious. Engineers deprioritize it, and technical debt grows.
How the Agent Works
Using the agentic workflow platform, Duolingo built an agent that handles the full lifecycle:
- Input: Flag name and the winning branch
- Agent action: Clones the repository, finds all references, removes the flag and dead code paths, updates tests
- Output: A pull request ready for human review
The agent uses Codex CLI (or Claude Code SDK) for the code reasoning and modification, with Temporal managing the workflow durability.
What’s Notable
This isn’t a demo — it’s a production system handling a real engineering maintenance task. The agent creates PRs that go through standard code review, maintaining human oversight. The pattern — AI generates, humans approve — is consistent across all of Duolingo’s agentic tools.
Other documented agent use cases include:
- Experiment launch/shutdown — automating A/B test lifecycle management
- Terraform infrastructure changes — modifying infrastructure-as-code with automated PR creation
Architecture Lessons
Lesson 1: Abstraction Over LLM Providers
Duolingo’s CodingAgent library solves a problem that will only grow: model lock-in. By abstracting Codex and Claude behind a single interface, they can:
- Switch models per-workflow based on task fit
- Benchmark new models against existing workflows
- Avoid rewriting agents when providers change APIs
This pattern — a unified agent abstraction layer — is worth watching as the model landscape continues to fragment.
Lesson 2: Temporal for LLM Workflows
The choice of Temporal for orchestration is architecturally significant. Most MCP demos show single-shot tool calls. Production systems need:
- Multi-step workflows with intermediate state
- Retry logic that doesn’t restart from scratch
- Timeouts that prevent runaway LLM calls from consuming resources
- Audit trails for compliance
Temporal provides all of these. The broader lesson: production MCP deployments need a workflow orchestration layer, not just a tool-calling layer.
Lesson 3: No-Code ≠ No Engineering
Duolingo’s “no-code” claim has a specific scope. The 5-minute agent creation works for common patterns (clone → modify → commit → PR). Complex multi-step workflows still take 1–2 days and likely require engineering support.
This is honest framing. The platform lowers the barrier for routine automation while preserving the ability to build complex agents when needed.
Lesson 4: 180 Tools Requires Tool Management
With 180+ MCP tools, tool selection becomes a first-class problem. Approaches to managing large tool sets include:
- Tool categorization — grouping tools by domain (what Datadog calls “toolsets”)
- Dynamic tool loading — loading only relevant tools based on conversation context
- Tool descriptions — well-crafted descriptions that help the LLM pick correctly
- Routing layers — pre-classifying user intent before presenting tool options
Duolingo’s specific approach to this problem wasn’t detailed in published materials, but the 180+ count means they’ve had to solve it.
Gaps and Open Questions
Duolingo’s published materials leave several questions unanswered:
| Question | Status |
|---|---|
| Error rates for AI-generated PRs | Not published |
| Cost per agent run | Not published |
| Percentage of PRs merged without modification | Not published |
| How the Slackbot selects from 180+ tools | Not detailed |
| Security model for tool access | Not detailed |
| Observability beyond Slack notifications | Not detailed |
| Docker-in-Docker constraint on Temporal | Was expected to resolve by early 2026 |
These gaps are normal for first-generation case studies. As with Pinterest’s initial blog post, expect more operational detail to emerge over time.
Comparison: Enterprise MCP Deployments
| Dimension | Duolingo | Datadog | |
|---|---|---|---|
| MCP tool count | 180+ | ~20 | ~16 core + 13 toolsets |
| Primary use case | Internal assistant + code automation | Engineering automation | External developer tool |
| Architecture | Centralized Slackbot + workflow platform | Domain-specific servers + registry | Remote hosted server |
| Orchestration | Temporal | Airflow MCP server | — |
| Human oversight | PR review for code changes | Elicitation-based approval | — |
| Auth model | OAuth 2.1 (Slack MCP) | JWT + service mesh | OAuth 2.1 |
| Open source | slack-mcp | — | — |
| LLM providers | Codex + Claude | Not published | Not applicable |
What This Means for the MCP Ecosystem
Duolingo’s deployment validates three trends we’ve been tracking:
1. MCP is becoming internal infrastructure, not just developer tooling. Duolingo’s Slackbot isn’t for developers — it’s for anyone in the company who needs to find information, triage alerts, or navigate internal systems.
2. The no-code agentic workflow pattern is emerging. Duolingo, Pinterest, and others are building platforms that let non-specialist employees create MCP-powered automations. This shifts MCP from a developer tool to an organizational capability.
3. 100+ tool deployments are real. With 180+ tools, Duolingo pushes past the scale where simple tool lists work. Expect to see more work on tool selection, categorization, and dynamic loading as enterprises hit similar scale.
Further Reading
- Duolingo Blog: Agentic Workflows — the no-code platform architecture
- Duolingo Blog: Building AI Agents — the feature flag removal agent
- duolingo/slack-mcp on GitHub — open-source Slack MCP server
- MCP Dev Summit 2026 Schedule — Aaron Wang’s keynote session
- Pinterest’s MCP Production Case Study — similar enterprise deployment
- Datadog’s MCP Server Design Lessons — tool design at scale
- MCP in Production — patterns across enterprise deployments