The Grafana MCP server gives AI agents direct access to your Grafana instance and the surrounding observability ecosystem — dashboards, Prometheus metrics, Loki logs, ClickHouse analytics, CloudWatch metrics, Elasticsearch searches, alerting rules, incident management, OnCall schedules, and Sift investigations. All from one server.
It’s official. Grafana Labs builds and maintains it at grafana/mcp-grafana. With 2,500 GitHub stars, 294 forks, 473 commits, and 15+ releases since December 2025, it’s the most popular observability MCP server by community adoption — more than double Sentry’s 579 stars. Written in Go, licensed Apache 2.0.
This is the second observability MCP server we’ve reviewed after Sentry (4/5). Where Sentry is deep and narrow — laser-focused on error tracking with proprietary AI analysis — Grafana is wide and extensible. It connects to whatever data sources your Grafana instance already has, which could be dozens of backends. The trade-off: breadth over depth.
What It Does
The server exposes 40+ tools across 15 configurable categories. Several categories are disabled by default to manage context window size — you enable only what you need.
Dashboard Management (enabled by default)
search_dashboards— find dashboards by title or metadataget_dashboard_by_uid— retrieve full dashboard JSONget_dashboard_summary— compact overview without the full JSON (recommended for context efficiency)get_dashboard_property— extract specific parts via JSONPath expressionsupdate_dashboard— create or modify dashboardspatch_dashboard— apply targeted changes without needing the full JSONget_dashboard_panel_queries— extract panel titles, queries, and datasource UIDs
Dashboard Query Execution (disabled by default)
run_panel_query— execute a dashboard panel’s query with custom time ranges and variable overrides, supporting Prometheus, Loki, ClickHouse, and CloudWatch datasources
Datasource Operations (enabled by default)
list_datasources— view all configured datasourcesget_datasource— retrieve datasource details by UID or name
Prometheus Querying (enabled by default)
query_prometheus— execute PromQL instant and range querieslist_prometheus_metric_metadata— retrieve metric metadatalist_prometheus_metric_names— list available metricslist_prometheus_label_names— query labels matching selectorslist_prometheus_label_values— retrieve specific label valuesquery_prometheus_histogram— calculate percentiles (p50, p90, p95, p99)
Loki Querying (enabled by default)
query_loki_logs— run LogQL log and metric querieslist_loki_label_names— list available log labelslist_loki_label_values— retrieve label value listsquery_loki_stats— get stream statisticsquery_loki_patterns— identify common log structures
ClickHouse Querying (disabled by default)
list_clickhouse_tables— list database tables with row counts and sizesdescribe_clickhouse_table— get column names, types, and metadataquery_clickhouse— execute SQL with macro and variable substitution
CloudWatch Querying (disabled by default)
list_cloudwatch_namespaces— discover AWS namespaceslist_cloudwatch_metrics— list namespace metricslist_cloudwatch_dimensions— get metric dimensionsquery_cloudwatch— execute CloudWatch metric queries
Elasticsearch Querying (disabled by default)
query_elasticsearch— execute searches via Lucene syntax or Query DSL with time range support
Log Search (disabled by default)
search_logs— high-level log search across ClickHouse (OTel) and Loki
Incident Management (enabled by default)
list_incidents— view incidents in Grafana Incidentcreate_incident— create new incidentsadd_activity_to_incident— add activity items to incidentsget_incident— retrieve specific incident details
Sift Investigations (enabled by default)
list_sift_investigations— retrieve investigationsget_sift_investigation— get investigation details by UUIDget_sift_analyses— retrieve specific analysesfind_error_patterns_in_logs— detect elevated errors in Lokifind_slow_requests— detect slow requests via Tempo traces
Alerting (enabled by default)
alerting_manage_rules— list, get, create, update, and delete alert rulesalerting_manage_routing— manage notification policies, contact points, and time intervals- Supports both Grafana-managed and datasource-managed rules (Prometheus/Loki)
OnCall (enabled by default)
list_oncall_schedules— view on-call schedulesget_oncall_shift— retrieve shift detailsget_current_oncall_users— see who’s on call right nowlist_oncall_teams/list_oncall_users— view teams and userslist_alert_groups— filter alerts by state, integration, labels, or time rangeget_alert_group_details— retrieve specific alert group information
Navigation (enabled by default)
generate_deeplinks— create accurate URLs for dashboards, panels, Explore views, with time ranges and query parameters
Annotations (enabled by default)
get_annotations— query annotations by time range, dashboard UID, or tagscreate_annotation/create_graphite_annotation— create dashboard or Graphite annotationsupdate_annotation/patch_annotation— full or partial annotation updatesget_annotation_tags— list tags with optional filtering
Rendering (enabled by default)
get_panel_image/get_dashboard_image— render panels or dashboards as PNG images (base64 encoded), with customizable dimensions, time ranges, themes, and variables
Admin Management (disabled by default)
list_teams/list_users_by_org— view teams and userslist_all_roles/get_role_details/get_role_assignments— inspect RBAC roleslist_user_roles/list_team_roles— view role assignmentsget_resource_permissions/get_resource_description— inspect resource-level permissions
Query Examples (disabled by default)
get_query_examples— retrieve example queries for datasource types
The configurable categories are the key design decision. Grafana’s full tool surface would consume ~16K tokens of context window — far too much for most agents. The --disable-<category> and --enabled-tools flags let you trim this to exactly the tools you need. Want just Prometheus and dashboards? Disable everything else. Want incident response? Enable incidents and OnCall, disable querying. This is the most granular tool management of any MCP server we’ve reviewed.
Setup
Grafana offers four installation methods:
UV (recommended for local use):
{
"mcpServers": {
"grafana": {
"command": "uvx",
"args": ["mcp-grafana"],
"env": {
"GRAFANA_URL": "http://localhost:3000",
"GRAFANA_SERVICE_ACCOUNT_TOKEN": "<your-token>"
}
}
}
}
Docker:
{
"mcpServers": {
"grafana": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"-e", "GRAFANA_URL=http://host.docker.internal:3000",
"-e", "GRAFANA_SERVICE_ACCOUNT_TOKEN=<your-token>",
"mcp/grafana"
]
}
}
}
Also available as a native Go binary and via Helm chart for Kubernetes deployments.
The server supports stdio (default), SSE, and Streamable HTTP transports. SSE runs on port 8000 by default and supports TLS. This means you can run it as a shared service — one Grafana MCP instance serving multiple agents or team members.
Authentication uses Grafana service account tokens — you create a service account in your Grafana instance with the appropriate RBAC permissions. Each tool requires specific permissions (e.g., dashboards:read for viewing, datasources:query for querying, alert.rules:write for managing alerts). For a quick start, the Editor built-in role covers most operations.
A --disable-write flag provides read-only mode — preventing any write operations like creating dashboards, modifying alerts, or creating incidents. This is essential for production environments where you want to give agents observability access without the ability to change anything.
What’s Good
The most comprehensive observability MCP server that isn’t locked to a single vendor. Grafana’s core value proposition is that it works with whatever backends you already use. Prometheus, Loki, Tempo, ClickHouse, CloudWatch, Elasticsearch — the MCP server inherits Grafana’s multi-datasource architecture. Datadog’s MCP server has more tools (50+), but they all query Datadog. Grafana’s tools query your existing infrastructure regardless of vendor.
Configurable tool categories prevent context window bloat. At 40+ tools and ~16K tokens of tool descriptions, you’d waste your context budget loading everything. The --disable-<category> flags let you present only what matters. This is better than Sentry (which loads all ~20 tools every time) and smarter than AWS MCP (which has role-based configurations but less granular control). The --enabled-tools flag goes even further, letting you cherry-pick individual tools.
Dashboard intelligence beyond CRUD. The get_dashboard_summary and get_dashboard_property (with JSONPath) tools are designed specifically for AI agents. Instead of dumping a 2,000-line dashboard JSON into context, you can extract exactly the panels, queries, or metadata you need. The patch_dashboard tool for targeted modifications without needing the full JSON is similarly agent-aware. This level of context-conscious design is rare in MCP servers.
Incident-to-investigation pipeline. Combining incidents, OnCall, Sift investigations, and alerting in one server means an agent can follow the complete incident lifecycle: get paged via OnCall → pull incident details → run Sift investigations to find error patterns and slow requests → check relevant dashboards → create annotations marking the incident timeline. No other single MCP server covers this full loop.
Real-time rendering. The get_panel_image and get_dashboard_image tools render actual Grafana visualizations as PNGs that agents can analyze. This is uniquely powerful — instead of just getting metric numbers, your agent can see the same graphs a human would see in the dashboard.
Active development with weekly releases. From v0.7.10 (December 2025) to v0.11.3 (March 2026) — 15+ releases in under 4 months, adding ClickHouse, CloudWatch, Elasticsearch, panel query execution, alerting consolidation, and image rendering. The pace is fast and the changelog shows substantial features, not just patch fixes.
Open source, Apache 2.0. The entire codebase is readable, forkable, and extensible. If Grafana’s server doesn’t query your niche datasource, you can add it yourself. This matters more for observability than most categories — teams have strong opinions about their monitoring stacks.
What’s Not
61 open issues with real bugs. This isn’t just feature requests:
query_prometheusfails with 500 errors for datasources configured withhttpMethod: GET(#632)- TextConsumer parsing errors with Grafana v12 (#635)
- Tool parameters use camelCase JSON names, breaking MCP clients that send snake_case (#641)
query_loki_logssilently truncates results without indicating data was omitted (#557)get_dashboard_panel_queriesomits non-Prometheus panels (#585)get_tracetool returns unbounded responses, needing filtering/truncation (#603)- 403 Forbidden errors on Prometheus/Loki query tools despite correct permissions (#524)
Security findings remain open. Issue #608 reports an AgentAudit scan finding TLS bypass and credential exposure in panic stack traces. A security policy template has been requested (#623) but doesn’t exist yet. For a server that connects to your production monitoring infrastructure, this matters.
Service account token auth is less secure than OAuth. Sentry’s MCP server uses OAuth 2.0 — you authenticate in your browser, tokens are scoped and revocable, nothing sensitive sits on disk. Grafana requires a service account token in your MCP client config file, typically in plaintext JSON. This is the standard approach for most MCP servers, but it’s a step behind Sentry and PagerDuty which offer OAuth flows.
No hosted remote server. Sentry has mcp.sentry.dev. Datadog has a hosted endpoint. PagerDuty has mcp.pagerduty.com. Grafana requires you to run the MCP server yourself — locally via uvx/Docker, or as a service via SSE/Streamable HTTP. This adds setup complexity and means you need to keep the server process running.
Requires Grafana 9.0+ for full functionality. The server relies on API endpoints introduced in Grafana 9.0. Earlier versions will silently fail on certain operations, particularly datasource-related tools. Given Grafana’s rapid release cycle, most instances should be 9.0+, but legacy deployments will hit issues.
16K token tool description footprint. Even with category disabling, the server’s instructions are large (#569). If you enable all categories, you’re consuming a significant chunk of your agent’s context window before any actual data enters. This is an acknowledged issue the team is working to reduce.
Some categories require Grafana Cloud features. Sift investigations, incidents, and OnCall are Grafana Cloud features. If you’re running self-hosted open-source Grafana, these tool categories will be available but non-functional. The server doesn’t clearly indicate which tools require Cloud vs. open-source Grafana.
Alternatives
Sentry MCP Server (4/5) — if you need deep error tracking with AI root cause analysis (Seer), OAuth authentication, and zero-install remote hosting. Sentry is narrower (errors only) but deeper in its niche. Use Sentry for debugging specific errors, Grafana for broader observability.
Datadog MCP Server — the enterprise alternative with 50+ tools across 9 modular toolsets, including unique features like LLM observability and feature flag management. If you’re already on Datadog, their server covers more operational surface. But it locks you into Datadog’s ecosystem.
grafana/loki-mcp — Grafana’s dedicated Loki MCP server for deep log querying. If you only need logs, this is lighter weight than the full Grafana MCP server. Similarly, grafana/tempo-mcp-server focuses purely on distributed tracing.
Community alternatives: DrDroidLab/grafana-mcp-server offers a lighter approach with PromQL and Loki query tools. christian-schlichtherle/grafana-mcp focuses on dashboard discovery and editing across multiple Grafana clusters. Neither matches the official server’s breadth.
Who Should Use This
Use the Grafana MCP server if:
- You run Grafana for observability and want agents that can query metrics, logs, and dashboards
- You use multiple data backends (Prometheus + Loki + ClickHouse, etc.) and want a single MCP server
- You need configurable tool categories to manage context window budget
- You want the incident management loop: alerting → OnCall → investigations → dashboards
- You value open source and the ability to self-host or extend the server
Skip it (for now) if:
- You want zero-install OAuth setup — Grafana requires running the server yourself
- You’re on Grafana versions below 9.0 — key tools will silently fail
- You only need Sentry-style error tracking — Grafana is broader but shallower on specific debugging
- You need a hardened production integration — 61 open issues including security findings suggest it’s still maturing