The Milvus MCP server is the official tool for connecting AI coding agents to Milvus, the open-source vector database that has become the most-starred in its category on GitHub with over 40,000 stars. Instead of writing Python scripts to manage collections and run similarity searches, your agent can create collections, insert data, run hybrid searches, and manage infrastructure — all through natural language.
It’s maintained by Zilliz (the company behind Milvus) at zilliztech/mcp-server-milvus. With 220 GitHub stars, 63 forks, and support for both stdio and SSE transport, it’s the most complete self-hosted vector database MCP server available. Milvus itself powers AI systems at NVIDIA, Salesforce, eBay, Airbnb, and DoorDash — over 10,000 enterprise teams in production.
This is our fourth vector database MCP server review after Chroma (3.5/5), Qdrant (3/5), and Pinecone (3/5). The Milvus server matches Chroma’s ambition with a different set of strengths.
What It Does
The server exposes 12 tools organized across three categories:
Search & Query (5 tools)
milvus_text_search— full-text search across documentsmilvus_vector_search— vector similarity searchmilvus_hybrid_search— combined text + vector search in a single querymilvus_text_similarity_search— text similarity using embedded functions (requires Milvus 2.6.0+)milvus_query— filter-based queries with expressions
Collection Management (5 tools)
milvus_list_collections— view all collectionsmilvus_create_collection— create with customizable schemasmilvus_get_collection_info— retrieve schema, properties, and metadatamilvus_load_collection— load collections into memory for searchmilvus_release_collection— unload collections to free memory
Data Operations (2 tools)
milvus_insert_data— add records to collectionsmilvus_delete_entities— remove entities via filter expressions
The standout feature is five search modes. No other vector database MCP server comes close. Chroma offers semantic + full-text + regex through a single tool. Qdrant has only semantic search. Pinecone has text search + metadata filtering. Milvus gives you full-text search, vector similarity, hybrid search (combining both), text similarity via embedded functions, and filter-based queries — each with its own dedicated tool.
Hybrid search is particularly valuable. Instead of choosing between keyword matching and semantic similarity, your agent can combine both in a single query. Milvus 2.5 made this native — no separate search infrastructure needed. For RAG pipelines where you need both precision (exact term matching) and recall (semantic understanding), hybrid search is the answer.
The memory management tools (load_collection and release_collection) are unique in this category. Milvus requires collections to be loaded into memory before searching — these tools give your agent direct control over that lifecycle. For production environments with many collections, being able to load only what’s needed (and release what’s not) is meaningful memory optimization.
Setup
The recommended setup uses uv without installation:
Stdio mode (default):
{
"mcpServers": {
"milvus": {
"command": "uvx",
"args": ["mcp-server-milvus", "--milvus-uri", "http://localhost:19530"]
}
}
}
SSE mode (for remote clients):
{
"mcpServers": {
"milvus": {
"command": "uvx",
"args": ["mcp-server-milvus", "--milvus-uri", "http://localhost:19530", "--transport", "sse"]
}
}
}
Zilliz Cloud:
{
"mcpServers": {
"milvus": {
"command": "uvx",
"args": ["mcp-server-milvus"],
"env": {
"MILVUS_URI": "https://your-endpoint.zillizcloud.com",
"MILVUS_TOKEN": "your-token"
}
}
}
}
Configuration is clean — three environment variables (MILVUS_URI, MILVUS_TOKEN, MILVUS_DB) or equivalent command-line arguments. A .env file is also supported. The same server binary works for self-hosted Milvus and Zilliz Cloud, which is a nice touch — change the URI and token, everything else stays the same.
The catch: you need a running Milvus instance. Unlike Chroma (ephemeral in-memory mode) or Qdrant (local embedded mode via QDRANT_LOCAL_PATH), there’s no way to spin up Milvus through the MCP server alone. You’ll need Docker or a Zilliz Cloud account before you can use this server at all.
What’s Good
Five search modes — category-leading. Text search, vector search, hybrid search, text similarity, and filter queries. Each gets its own tool with dedicated parameters. This is the broadest search capability of any vector database MCP server. For agents building RAG pipelines, having the right search mode matters more than having the most tools.
Hybrid search is genuinely useful. Milvus 2.5 unified lexical and semantic retrieval natively. Through this MCP server, your agent can run hybrid queries that combine keyword precision with semantic recall — without maintaining two separate indexes or search backends. This is the capability that convinced us hybrid search is the future of RAG, not just a feature checkbox.
Full delete capability. milvus_delete_entities takes filter expressions, letting your agent selectively remove data. Qdrant MCP has no delete at all. Pinecone MCP has no delete. Chroma MCP has delete-by-ID or filter, which is comparable. Being able to remove stale or incorrect data is essential for any production vector pipeline.
Memory management controls. load_collection and release_collection give agents explicit control over which collections are in memory. This is specific to Milvus’s architecture (collections must be loaded before querying), but it’s a genuine operational advantage. Your agent can load a collection, run searches, and release it — rather than keeping everything loaded and consuming RAM.
Works with both self-hosted and cloud. Same server, same tools, different connection string. Self-hosted Milvus on your infrastructure or Zilliz Cloud — the MCP experience is identical. This flexibility is particularly good for teams that develop locally and deploy to cloud.
SSE transport. One of only two vector database MCP servers with SSE support (Qdrant being the other). This enables remote MCP connections, which matters for team environments where multiple developers need access to the same vector infrastructure.
What’s Not
No embedded/local mode. This is the server’s biggest practical limitation. Chroma has ephemeral mode (in-memory, zero setup). Qdrant has QDRANT_LOCAL_PATH (embedded, no server needed). Milvus requires a running instance — either Docker (milvus-standalone) or Zilliz Cloud. For quick prototyping or adding semantic memory to a coding agent, this setup overhead is a dealbreaker.
No document update. You can insert and delete, but you can’t update existing entities in-place. To modify a record, you must delete and re-insert. Chroma MCP has update_documents. For iterative RAG pipeline development where your agent is refining document content or metadata, the delete-and-reinsert cycle is friction.
Pre-release maturity. 35 commits. No versioned releases. The last update was December 2025 — three months ago. Six open issues with no assignees, including an OOM exception report (#44) and a bug where the service becomes unresponsive after errors (#51). This isn’t a shipped product yet; it’s a maintained prototype.
No Streamable HTTP transport. SSE is good but the MCP ecosystem is moving toward Streamable HTTP. Qdrant already supports it. There’s an open feature request (#40) for Streamable HTTP, but no response from maintainers. For forward-looking deployments, this gap matters.
Python-only. Requires Python 3.10+ and the uv package manager. No npm package, no Go binary. If your development stack is Node.js or Go, you’ll need Python infrastructure for this server. Pinecone MCP is TypeScript-based, which integrates more naturally into JavaScript workflows.
No embedding configuration. The server relies on Milvus’s built-in embedding functions or pre-computed embeddings. Unlike Chroma MCP (six embedding providers: OpenAI, Cohere, Jina, VoyageAI, Roboflow, Default), there’s no MCP-level control over which embedding model to use. You configure embeddings in Milvus itself, not through the MCP server.
Service hang bug. Issue #51 reports that after an error occurs, the MCP service gets stuck and cannot accept new requests. For long-running agent sessions, an unrecoverable error state is a serious reliability concern.
How It Compares
| Feature | Milvus MCP | Chroma MCP | Qdrant MCP | Pinecone MCP |
|---|---|---|---|---|
| Stars | 220 | 515 | 1,300 | 56 |
| Tools | 12 | 13 | 2 | 9 |
| Transport | stdio, SSE | stdio only | stdio, SSE, Streamable HTTP | stdio |
| Search types | 5 (text, vector, hybrid, similarity, filter) | 3 (semantic, full-text, regex) | 1 (semantic) | 2 (text, metadata) |
| Collection CRUD | Full (create, list, info, load, release) | Full + fork | Auto-create only | Read-only |
| Document insert | Yes | Yes | Yes | Yes |
| Document delete | Yes (by filter) | Yes (by ID/filter) | No | No |
| Document update | No | Yes | No | No |
| Hybrid search | Yes (native) | No | No | No |
| Reranking | No | No | No | Yes |
| Local/embedded mode | No | Yes (ephemeral + persistent) | Yes (local path) | No |
| Cloud mode | Yes (Zilliz Cloud) | Yes (Chroma Cloud) | Yes (Qdrant Cloud) | Yes (only) |
| Memory management | Yes (load/release) | No | No | No |
| Embedding config | External | 6 providers | FastEmbed | Integrated |
| Language | Python | Python | Python | TypeScript |
| Maturity | Pre-release | Beta | Stable | v0.2.1 |
Milvus wins on search breadth (five modes vs. three for Chroma) and is the only server with native hybrid search. Chroma wins on deployment flexibility (four modes, including ephemeral) and document operations (update + fork). Qdrant wins on transport support (all three) and adoption (1,300 stars). Pinecone wins on search quality (reranking) and maturity.
The Milvus MCP server’s strongest argument is for teams already running Milvus in production. If your vector infrastructure is Milvus, this is the most capable MCP interface to it — especially if you need hybrid search. If you’re starting fresh and want the lowest friction, Chroma (ephemeral mode) or Qdrant (embedded mode) will get you running faster.
The Bigger Picture
Milvus is the most-deployed open-source vector database, with 40,000+ GitHub stars and adoption at companies like NVIDIA, Salesforce, and eBay. Zilliz, the company behind it, also offers Zilliz Cloud (managed Milvus) and maintains a separate Zilliz Cloud MCP server with 16 tools including cluster management and Streamable HTTP transport.
The split between two MCP servers is worth noting. The Milvus MCP server handles database operations (what’s inside Milvus). The Zilliz Cloud MCP server handles infrastructure operations (the Milvus deployment itself). This makes architectural sense but creates confusion — the same vendor maintaining two MCP servers for the same database, each with different tool sets and transport support.
The pre-release maturity is also worth weighing against Milvus’s production reputation. The database itself is battle-tested at massive scale — GPU-accelerated indexing, HNSW/IVF/DiskANN, multi-tenancy, high availability. The MCP server exposes a fraction of those capabilities through 12 tools with 35 commits and known stability issues. There’s a gap between the database’s maturity and the MCP server’s maturity.
That said, what’s here is solid. Five search modes with native hybrid search is genuinely category-leading. Full collection CRUD plus delete fills gaps that Qdrant and Pinecone leave open. SSE transport enables remote access. And the fact that it works identically with self-hosted Milvus and Zilliz Cloud means you’re not locked into a deployment model.
For the vector database MCP space — still young, still fragmented — the Milvus MCP server is one of the two most capable options alongside Chroma. If hybrid search matters to your RAG pipeline (and it should), this is the server to use.
Rating: 3.5/5
The Milvus MCP server earns a 3.5/5 for bringing the most popular open-source vector database’s key capabilities to AI coding agents — 12 tools with category-leading search breadth (five modes including native hybrid search), full collection CRUD, delete support, memory management controls, and SSE transport. It loses points for requiring a running Milvus instance (no embedded/local mode), pre-release maturity (35 commits, no versioned releases, three months since last update), no document update capability, a known service hang bug (#51), and no Streamable HTTP transport. The database behind it is world-class; the MCP layer is catching up.
Use this if: You’re running Milvus in production and want AI-assisted vector operations, especially hybrid search combining keyword and semantic retrieval.
Skip this if: You want zero-setup vector memory (use Qdrant’s embedded mode), you need the most polished MCP experience (use Chroma), or your stack is JavaScript-based (use Pinecone’s TypeScript server).