Day 2 at the AI Engineer World’s Fair 2026 (Moscone West, San Francisco) was the Coding Agents day. The main stage made the central tension of the entire conference explicit: two back-to-back talks taking opposite positions on how autonomous coding systems are best understood — and therefore built.

The Main Stage Argument

At 11:10 AM, Tereza Tížková (Factory) presented “Rise of the Software Factory” — the case that AI coding systems are best understood as production lines. Agents have tasks, toolbelts (repos, test runners, deployment scripts, documentation), context (specs, architecture decisions, prior constraints), and feedback loops. You, the engineer, design the factory. The factory builds the software.

At 11:40 AM, Charlie Holtz (Conductor, CEO) followed immediately with “Orchestras, not Factories.” His counter: the factory metaphor is wrong in a way that matters. Factories produce interchangeable outputs through synchronized steps. Orchestras produce coordinated performance from independent instruments — each following the same score but with its own interpretation. The difference isn’t metaphor: it’s architectural. Factories imply assembly lines and queues; orchestras imply event-driven, asynchronous, harmonized-but-not-synchronized systems.

The fact that the schedule placed these talks back-to-back suggests the AIEWF organizing team was deliberately surfacing the argument rather than endorsing either position. It worked: this was the most discussed main-stage pairing of the day.

Both positions agree on the fundamentals. Autonomous coding agents are production reality. The dispute is about the mental model that should govern how you build the infrastructure around them — and that model shapes API design, error handling, retry semantics, observability, and team structure.

Daksh Gupta’s Data: 1M+ AI-Generated PRs

At 12:05 PM, Daksh Gupta (Greptile co-founder and CEO) presented empirical data on AI-generated code at scale — the dataset that neither the factory camp nor the orchestra camp had publicly quantified before.

Greptile has reviewed several million PRs across 65,000 organizations. The headline numbers:

  • 0.86% of PRs showed evidence of being fully AI-generated in February 2025
  • 27.6% were fully AI-generated by April 2026 — a 32x increase in 14 months

That rate of adoption has no precedent in software tooling history. It is not a slow enterprise rollout. It is a step-function shift.

The quality data is the part that will get cited for months. Reversion rates for AI-generated PRs are approximately equal to reversion rates for human-written PRs. For larger PRs specifically, AI-generated code had lower revert rates than human-written code. The AI does not produce worse code at scale — it produces comparably-reviewed code, in larger units.

AI PRs are also about 20% larger by median lines of code than the same developer’s non-AI PRs. Agents don’t write small changes; they write complete implementations.

The implication Gupta drew: you can no longer assume a PR is human-authored. Your review process, your CI/CD pipeline, your security analysis, and your onboarding documentation all need to be rethought with the assumption that a significant and growing fraction of contributions are agent-generated.

GitHub Copilot Agents (Idan Gazit, GitHub Next)

In the Claws & Personal Agents track, Idan Gazit (Head of GitHub Next) presented “Build agents fast with GitHub Copilot.” The talk covered the current state of Copilot’s autonomous agent capabilities — beyond the autocomplete model and into fully agentic PR workflows.

GitHub’s position is that the repository is the natural unit of context for a coding agent. Not a file, not a function — the entire repo, with its history, its structure, its CI outputs, and its open issues. GitHub Next is building toward agents that treat the repo as a living artifact they can modify, test, and contribute to autonomously, with a human in the approval loop rather than the generation loop.

The session drew a full track room. The practical question attendees pushed on: what does “approval loop” mean when 27.6% of PRs are already agent-generated and teams are scaling? At some point the human can’t read every agent PR in depth.

Evals as the Rising Discipline

Two CTO Circle talks addressed what becomes necessary when you have 27.6% AI-generated PRs and you can’t manually validate all of them:

“Your Agent Evolved. Your Evals Didn’t” (Ameya Bhatawdekar, Braintrust VP Field CTO) — the case that most eval frameworks were designed for single-turn LLM calls and break down for multi-step agents. Agent behavior is path-dependent; a single-score eval misses the cases where the agent got the right answer via the wrong path.

“Your Agent Didn’t Fail. Your Harness Did” (Vinoth Govindarajan, OpenAI MTS) — the complementary argument that what looks like agent failure is usually harness failure: bad context, wrong tool access, unclear success criteria, or insufficient feedback loops.

Swyx’s analysis of the three-year arc of AI engineering, shared at the conference, frames these sessions in context: when the AI Engineer thesis was published in mid-2023, RAG and prompt engineering were the frontier topics. By 2026, those topics have “been pushed to the margins.” Evals are the discipline that replaced them at the frontier — the systematic, measurable approach to forward progress that mature engineering requires.

Anthropic Claude Managed Agents Workshop

The Claude Managed Agents Workshop (Priyanka Phatak and Gabriel Cemaj, Anthropic) ran as a four-part sequence from 10:45 AM to 12:05 PM. The workshop covered the patterns Anthropic recommends for orchestrating Claude in multi-agent systems: task decomposition, tool design, memory architecture, and the specific failure modes that emerge when agents call agents.

Lance Martin (Anthropic MTS) also presented “Claude for long-horizon tasks” in the afternoon — complementary to the workshop, focused on how to structure tasks that span hours rather than seconds, including how to handle context window limits, intermediate checkpointing, and task resumption.

Closing: The Future of Cursor

Lee Robinson (Cursor ML/Model Behavior) closed Day 2 with “The Future of Cursor.” Cursor is currently the dominant AI coding editor by active users, and Robinson’s talk was directional: where does Cursor go from here?

The direction he outlined is toward ambient coding environments — systems that maintain context across sessions, remember decisions made in previous coding runs, and surface relevant constraints without being asked. Less “the AI that responds to your questions” and more “the development environment that already knows what you’re trying to do.”

The Argument That Matters for Builders

The factory vs orchestra framing is not aesthetic. It has practical consequences:

Question Factory answer Orchestra answer
How do agents share context? Shared queue / pipeline Broadcast event / shared score
How do errors propagate? Stop the line Isolate the instrument
How do you add capacity? Add workers Add instruments
How do you measure output? Throughput Coherence
What does failure look like? Stalled job Off-key section

Gupta’s data suggests the practical consequence is already arriving faster than most engineering orgs have designed for. If 27.6% of PRs are agent-generated today, and revert rates are comparable to human PRs, the constraint shifts from “is the code good enough” to “how do we understand what agents are doing at scale.” That is an observability problem as much as a quality problem.

The evals track answers that question from the measurement side. The harness engineering track answers it from the infrastructure side. Both converged on Day 2.

What’s Next

Day 3 (July 1) is the Autoresearch day — multi-agent research systems, context engineering, and Barr Yaron’s 2026 AI Engineering Survey results. The survey historically is the most-cited data product from the entire conference: benchmarks against which teams measure their own practices.

Day 4 (July 2) is Harness Engineering: production infrastructure, agentic commerce, inference, and security.

ChatForest covers AI infrastructure from the builder’s perspective. This recap is based on publicly available session data and talk descriptions from the AIEWF 2026 schedule and post-conference coverage. ChatForest did not attend the conference; this is a synthesis of publicly available information.