At a glance: Microsoft made computer use agents generally available in Copilot Studio on May 13, 2026. Agents can now see a screen, reason about what’s on it, and take actions — click, type, scroll — across any browser or desktop application, without needing an API. Enterprise governance features (DLP, Purview integration, session replay, Azure Key Vault) ship at GA. Desktop app success rates (~35%) and handling of dynamic UI elements remain real limitations. Part of our Microsoft & Azure coverage. For related enterprise AI, see our Microsoft Agent 365 review.
Every enterprise has applications that haven’t shipped an API. Some of those applications run mission-critical workflows — HR systems, legacy ERPs, procurement portals, internal tools built on platforms that were never designed to be automated. For years, the answer was robotic process automation: record a macro, specify exactly which pixel to click, hope the UI doesn’t change.
Computer use in Copilot Studio takes a different approach. Instead of recording exact coordinates, the agent looks at the screen, understands what it sees, and decides what to do next — the way a person would. It’s the same vision-and-reasoning approach that Anthropic pioneered with Claude’s computer use capability, now integrated directly into Microsoft’s enterprise agent platform.
With GA on May 13, this is no longer a preview experiment. It’s a production-ready capability with enterprise governance built in.
What Computer Use Actually Does
When you add the computer use tool to a Copilot Studio agent, you give it four things: a browser, a screen, a keyboard, and the ability to read what’s on the page and take the next logical step.
The agent doesn’t navigate via selectors. It captures a screenshot, processes what it sees using a vision model, decides what action to take (click, type, scroll, or wait), executes that action, and repeats until the task is done or it needs human input. The workflow branches naturally around what’s actually on screen at each moment — not what the developer expected to be there when they built the automation.
This matters in practice because UIs change. A button moves. A workflow gets an extra confirmation screen. A vendor updates their portal layout. Selector-based automation breaks when this happens; computer use adapts.
Setup
Getting started is intentionally low-friction: create or open an agent in Copilot Studio, go to Tools → Add tool → Add new computer use, and describe the task in natural language. The agent handles the rest. There’s no macro recording, no XPath selectors, no coordinate capture.
The Governance Story
Microsoft has put substantial work into making computer use acceptable to enterprise IT and compliance teams, not just power users.
DLP and environment isolation. Computer use agents operate under the same Data Loss Prevention policies and environment boundaries as the rest of Power Platform. If your DLP policy prevents an agent from accessing a particular data domain, that restriction applies here.
Purview and Dataverse integration. Every run is logged. Audit trails propagate to Microsoft Purview and Dataverse, making it possible to review what the agent saw, what it clicked, and what data it touched. For regulated industries, this changes the feasibility calculus significantly.
Session replay. Administrators and makers can watch exactly what the agent did during a run: step-by-step screenshots, action types (click/type/scroll), coordinates, timestamps, and context notes. This is the kind of observability that traditional RPA tools have provided for years, now available for AI agents.
Run summaries. Each run produces a structured summary: total duration, action count, average time per action, number of human escalations, and instruction text. Useful for both debugging and capacity planning.
Credentials and Security
Computer use agents often need to authenticate into systems — which creates an obvious question about how credentials are handled.
Microsoft provides two options:
- Internal storage: credentials encrypted within Power Platform. Lower friction, appropriate for most scenarios.
- Azure Key Vault: enterprise-grade secret management with your existing key infrastructure.
In both cases, credentials are never exposed to the AI model itself. The model sees the screen; it doesn’t see the password. Authentication happens at the infrastructure layer.
Human-in-the-Loop
Computer use doesn’t have to run fully autonomously. Copilot Studio supports human-in-the-loop checkpoints — moments in the workflow where the agent pauses, describes what it’s about to do, and waits for approval before proceeding.
These checkpoints can be configured to trigger at specific steps or to activate automatically when the agent’s confidence is below a threshold. If something on screen doesn’t match what the agent expected, it escalates rather than guessing.
This is meaningful for high-stakes workflows. A computer use agent filing expense reports or updating inventory records can be configured to pause before any write operation, turning what might be an autonomous action into a supervised one.
Model Choice
Copilot Studio’s computer use supports multiple underlying vision models: options from both OpenAI and Anthropic are available. This gives organizations flexibility to align model choice with existing vendor relationships or compliance requirements.
Where It Works Well — and Where It Doesn’t
The honest performance picture is a wide gap between browser-based and desktop tasks.
Web apps: ~80% success
For browser-accessible workflows — pulling data from a vendor portal, submitting a form, navigating an HR system’s web interface — computer use performs reliably. The vision model handles layout changes well. This is where the legacy software use case is strongest: SAP Fiori, Oracle Web Applications, internal intranets, procurement portals. If a person can access it through a browser, the agent generally can too.
Desktop apps: ~35% success
Native desktop applications are harder. Rendering is more variable, accessibility APIs are inconsistent, and the agent has less surface area to reason about. The success rate at GA is roughly 35% — meaningful enough to be useful for specific tasks, not reliable enough to replace human-operated workflows end to end.
Dynamic UI elements
Dropdowns, date pickers, multi-select widgets, and custom-built interface components cause problems across both environments. These elements behave differently from what the model was trained on, and the agent can enter a loop if the screen doesn’t respond the way it expected. Microsoft documents this explicitly in the troubleshooting guide.
Rate limits
Computer use runs count against Copilot Studio message quotas. Per-minute and per-hour limits apply; exceeding them blocks agent messages until the quota resets. For high-volume automation scenarios, this requires capacity planning.
Use Cases Worth Taking Seriously
Legacy ERP data extraction. An agent that logs into SAP, navigates to a report view, extracts data into a table, and closes the session — without any SAP API integration — is a genuinely valuable automation for finance and supply chain teams.
Application-to-application data transfer. Moving data between systems that have no integration and no API: pull from system A, paste into system B. Not elegant, but it replaces a human doing the same task.
Process compliance documentation. An agent that walks through a workflow in a regulated system and produces a screenshot-by-screenshot audit trail of what was done, by whom (or what), and when.
Web-based vendor workflows. Submitting orders, checking shipment status, pulling invoices from supplier portals — all without waiting for a vendor to build an API.
What This Is Not
Computer use is not a replacement for proper API integration where APIs exist. An API call is faster, more reliable, and more auditable than a vision-based UI interaction. If the system you’re automating has an API, use it.
It’s also not a general-purpose robotic process automation platform replacement yet. The desktop success rate is too low to confidently replace existing RPA deployments without significant testing.
What it is: an escape hatch for everything that falls outside what APIs and RPA can reach. That’s a large and valuable category.
Pricing
Computer use in Copilot Studio runs on usage-based billing. An Azure subscription is required. Microsoft doesn’t publish per-run pricing directly on the feature page; costs accumulate against Copilot Studio message capacity, which is purchased through Azure or Microsoft 365 licensing.
Organizations already paying for Power Platform or Microsoft 365 E5 licenses should check whether computer use is included in their existing allocation before provisioning additional capacity.
Bottom Line
Computer use in Copilot Studio GA is a real enterprise capability, not a demo. The governance story — DLP, Purview, session replay, Key Vault, human checkpoints — is mature enough for production deployment in regulated environments. The web-based performance is solid.
The desktop gap (35% success) is the honest limitation that most coverage skips over. If your automation target is a native Windows application that doesn’t render in a browser, computer use is not a reliable solution yet.
For organizations with legacy web-based workflows and no API access — which describes a large share of enterprise software estates — this is worth evaluating now.
Rating: 3.5 / 5 — Production-ready for web workflows. Early-stage for desktop. The governance story is genuinely good.
Computer use in Copilot Studio is available in all commercial Power Platform geographies. Azure subscription required. For setup documentation, see Microsoft Learn. This review is based on publicly available documentation and third-party coverage; ChatForest has not independently tested the product.