Gartner published a press release on May 26, 2026, with a specific prediction: 40% of enterprises will demote or decommission autonomous AI agents by 2027 — after discovering governance gaps in production.
The failures are not coming from security incidents or capability problems. They are coming from a governance architecture error that almost every enterprise is making right now: treating all AI agents as if they are the same thing.
This is the risk builders need to understand before they ship an enterprise agent. Passing the security review is not enough. Enterprise buyers are learning — slowly, through production incidents — that a well-secured agent can still be the wrong agent if its permission scope does not match its autonomy level. And when that mismatch becomes visible in production, the agent gets rolled back.
The Root Cause Gartner Identified
Sanchit Vir Gogia, an analyst quoted in coverage of the Gartner research, put it clearly: applying one governance model to all agents is “like applying the same control regime to a receptionist and a surgeon because all use a laptop. It is tidy on paper. It is nonsense in practice.”
Enterprises are currently applying two approaches to AI agent governance, and both are wrong.
The locked-down approach: Treat all agents as high-risk. Require full approval workflows, audit trails, and kill-switch mechanisms regardless of what the agent actually does. Result: agents that observe documents and summarize them have the same overhead as agents that modify production databases. Builders know this pattern — it adds friction so severe that the agent becomes slower and more expensive than the human workflow it was supposed to replace. Adoption collapses.
The fully-trusted approach: Grant broad access on the assumption that the agent will work within implicit constraints. No explicit scope boundaries. No tiered controls. Result: an agent built for one task gradually accumulates permissions and context that were never intended for it. When something goes wrong, the blast radius is larger than the enterprise expected. The post-incident review identifies a governance gap. The agent gets decommissioned.
The 40% prediction follows from the fully-trusted failure mode. Enterprises are deploying autonomous agents with underspecified governance, and the production failures are happening — just slowly enough that the lessons have not yet propagated into procurement criteria for most buyers.
The Four-Tier Framework
Gartner’s recommended alternative is a proportional governance model that classifies agents by autonomy level and matches controls to each tier. The tiers are not suggestions — they define what governance is required before an agent at each level can legitimately operate in an enterprise environment.
Level 1: Observe
An observe agent has read-only access to a defined set of data sources. Its outputs go to the requesting user only — it does not write, communicate, or act.
Examples: document summarization, knowledge retrieval, code explanation, policy lookup.
Required controls at this level are lightweight: scoped data access (no access beyond what the task requires), user authentication, usage logging, and basic functional and security testing. The governance overhead should be minimal because the blast radius of an observe agent is minimal. If it hallucinates a summary, a human reads the wrong summary. That is the maximum failure mode.
The mistake enterprises make at Level 1: applying Levels 3 or 4 governance because “it has access to sensitive data.” If the access is read-only and scoped, the right control is scoping and logging — not approval workflows and kill switches.
Level 2: Advise
An advise agent generates recommendations, drafts, or proposed actions. Humans review all outputs and execute actions manually. The agent does not modify anything.
Examples: email drafting, report generation, decision support dashboards, candidate scoring in hiring workflows.
The governance requirements expand relative to Level 1, but the expansion is appropriate to what has changed. The agent’s output now influences decisions that humans will execute. That introduces hallucination risk in a new form: a well-confident wrong recommendation that a human follows without independent verification.
Required additions at Level 2: accuracy testing relevant to the domain (not just generic benchmark scores), hallucination detection tuned to the specific task, domain-specific quality evaluation, and explicit user training on appropriate reliance levels. That last point is governance of the human-in-the-loop, not just the agent. An advise agent only works correctly if the human reviewing its output understands the agent’s error profile.
Level 3: Act with Approval
A Level 3 agent can execute actions — write to databases, send communications, modify configurations — but only after explicit human approval. The human remains in the loop on every consequential action.
Examples: customer service ticket resolution, code commit on review, vendor invoice processing, scheduling changes with external parties.
This is the tier where governance architecture becomes non-negotiable for enterprise procurement. The distinction between Level 3 and Level 4 is not technical — it is the role of the human in the loop. At Level 3, humans approve before action. At Level 4, humans review exceptions after action. That difference carries significant audit and liability implications in regulated industries.
Required controls at Level 3: strong security testing (the agent is now a write-capable system), clear approval workflows with defined escalation paths, complete audit trails with action provenance, and incident response procedures specific to this agent’s action surface.
The governance mistake at Level 3: assuming that “human in the loop” is a sufficient control without specifying what constitutes meaningful review. If the approval interface is a rubber-stamp that processes 200 approvals per minute, the human is in the loop formally but not substantively. Enterprise buyers in regulated industries are beginning to distinguish between formal and substantive human review — and agents whose approval workflows cannot demonstrate substantive review are being downgraded to Level 2.
Level 4: Autonomous Execution
A Level 4 agent acts independently within defined guardrails. Humans review exceptions only — not individual actions. The agent modifies production systems at machine speed.
Examples: infrastructure scaling, trading execution, clinical decision support in low-complexity cases, large-scale code refactoring.
The governance requirements at Level 4 are not extensions of Level 3 — they are a different category. Valence Howden, quoted in Gartner’s coverage, describes this tier as requiring “anti-fragile adaptive models” — governance that responds to edge cases the original design did not anticipate.
Required controls at Level 4: comprehensive guardrail definitions with explicit boundary conditions, rollback capabilities for every action category the agent can take, continuous monitoring with anomaly detection, kill-switch mechanisms that trigger automatically on defined conditions, red team testing before production deployment, clear ownership assignment (someone’s name is on this agent’s behavior), and business continuity procedures for agent failure.
The governance mistake at Level 4: conflating guardrails with constraints. A constraint is a hard limit. A guardrail is a soft boundary with a defined response when it is approached. Guardrails are more expensive to implement and more flexible in production. Agents that reach Level 4 deployment need guardrails, not constraints, because the scenarios they encounter will not be fully specified by any design process.
The Distinction That Kills Most Agents
Gartner identifies a specific conflation error at the root of most governance failures: enterprises do not distinguish between an agent’s ability to act and the scope of access they have granted it.
An agent’s ability to act is a technical property of its design. An observe agent technically cannot modify data. A Level 4 agent technically can.
An agent’s scope of access is a governance decision. An observe agent can be granted access to the entire corporate document repository or only to a specific project folder. A Level 4 agent can be granted write access to all production systems or only to a single, bounded microservice.
The failure pattern: enterprises evaluate ability to act, accept that a Level 3 agent “has human approval so it’s safe,” and then grant that agent access far beyond what its specific task requires. When the agent reaches an unexpected edge case and the approval workflow fails — either because the approver misunderstood the action or because the workflow was bypassed during a high-volume period — the blast radius extends to everything the agent has access to, not just what it was designed to touch.
Gartner’s summary principle: “Do not scale agents faster than you can govern their authority." The authority here is access scope, not technical capability. Builders can ship a Level 4 agent that is technically autonomous while keeping its access scope narrow enough that a governance failure produces a contained incident rather than a catastrophic one.
What This Means for Builders Building Enterprise Agents
The 40% rollback prediction is a description of the current trajectory. Enterprises are deploying agents faster than they are developing governance frameworks, and the failures are accumulating in production. By 2027, a significant share of those deployments will be reversed.
For builders, this creates a specific near-term pressure: enterprise buyers who have already been through one governance failure are applying substantially more rigorous review to new agent deployments. The procurement question is no longer just “does it work?” — it is “what tier is this agent, and does the governance posture match the tier?”
Classify your agent clearly. Know what tier your agent operates at and make that classification legible to buyers. An agent that is ambiguous about its autonomy level will be provisionally classified at the highest level the buyer can imagine — and governed (or rejected) accordingly.
Narrow the access scope to the minimum. Regardless of tier, an agent that requests access only to what it needs signals governance maturity. Enterprise buyers are increasingly using access scope requests as a proxy for the builder’s understanding of their own agent’s failure modes.
Build the Level 3 → Level 4 transition explicitly. Many agents start as Level 3 (act with approval) and are intended to evolve toward Level 4 (autonomous) as the enterprise gains confidence. Build the approval workflow in, design the monitoring and kill-switch infrastructure, and define the specific metrics that would trigger a tier upgrade. Buyers who can see a clear path from supervised to autonomous deployment are more likely to approve initial deployment.
Connect governance to the security procurement conversation. The enterprise security gate we covered after the Geordie $30M raise addresses the pre-deployment moment — security teams blocking agentic deployments before they start. The Gartner governance framework addresses the post-deployment failure mode — agents that pass security review but get rolled back because their governance posture cannot survive production reality. Builders building for enterprise sale need to address both. A well-secured agent without tiered governance will eventually surface a production failure that triggers decommissioning. A well-governed agent that cannot pass security review never ships.
The 40% is not a fixed outcome. It is a description of what happens when governance does not keep pace with deployment speed. Builders who build the governance architecture into the agent design — not as a layer applied after the fact — are building agents that survive long enough to prove their value.