At Google I/O 2026, Sundar Pichai said that Google is now processing more than 3.2 quadrillion tokens per month across its products and APIs.
That number is large enough to be meaningless without context. So here is the context:
| Period | Monthly tokens | YoY growth |
|---|---|---|
| May 2024 | 9.7 trillion | — |
| May 2025 | ~480 trillion | ~49× |
| May 2026 | 3.2 quadrillion | ~7× |
A quadrillion is 10¹⁵. 3.2 quadrillion is 3,200 trillion. In two years, Google’s token throughput has grown approximately 330× — doubling, on average, faster than every 60 days.
This is not a product announcement. It is a measurement of something that is already happening, at a scale that changes how builders should think about the economics of building on AI infrastructure.
What Quadrillion-Scale Looks Like From the Inside
The headline figure is a platform total — every product that runs on Gemini, including Search, Google Workspace, YouTube summaries, Android AI features, and consumer Gemini apps. It is not exclusively API traffic.
The API-specific numbers are smaller but still striking:
- 19 billion tokens per minute through the Gemini Developer API
- 375+ Google Cloud customers each processed more than 1 trillion tokens over the past 12 months — roughly 85 billion tokens per customer per month
- 8.5 million developers building new apps and experiences with Gemini models monthly
The 375-customer figure is the one worth focusing on. These are organizations — not individuals — running production workloads at a trillion tokens or more per year. That is a new class of AI user that did not exist two years ago. Their workloads are not chatbots. They are pipelines: ingestion, classification, extraction, synthesis, code generation, agent loops running continuously.
Why the Growth Is Not Linear
The 7x year-over-year jump does not come primarily from more users asking more questions. It comes from the architecture of agentic workloads.
A human asking a chatbot a question might consume 500–2,000 tokens per exchange. A coding agent completing a feature branch might consume 500,000–2,000,000 tokens per task — context loading, tool calls, intermediate reasoning, output generation, verification loops. A background agent running for hours on a document corpus might consume more than that before producing a single output.
When developers shift from chat interfaces to agent pipelines, their token consumption does not increase by 10×. It increases by 100–1,000×. Google’s I/O 2026 numbers reflect an ecosystem in the early stages of that transition. The growth is structural, not behavioral.
The phenomenon even has a name now. The Register‘s I/O 2026 coverage flagged “tokenmaxxing” as a cultural reality in the developer community: deliberately maximizing token consumption to extract model capabilities — large context windows, chain-of-thought prompting, multi-step reasoning — because at current price points, it is often cheaper to spend more tokens than to engineer around them.
The Pricing Consequence
High volume at the infrastructure layer creates downward pressure on per-token price. This is not speculation — it is already visible in the pricing announcements at I/O 2026.
Gemini 3.5 Flash launched on May 19, 2026 at $1.50 / $9.00 per million input/output tokens. It outperforms the prior-generation Gemini 3.1 Pro ($2.00 / $12.00) on agentic benchmarks — 83.6% on MCP Atlas versus 3.1 Pro’s lower score — at 25% lower cost.
This is the recurring pattern in commodity infrastructure markets: the dominant provider uses volume to fund price cuts that lock in the next generation of workloads before competitors can match on both capability and economics.
For builders with high-volume agent workloads, the effective cost reductions compound further:
- Batch API: 50% discount for non-realtime jobs (Gemini 2.5 Flash-Lite falls to $0.05 / $0.20 per million tokens in batch mode)
- Context caching: up to 90% reduction on cached prompt tokens for large repeated context windows
- Committed use: enterprise contracts at scale trigger additional negotiated discounts
Google cited that if top companies shifted 80% of their frontier-model workloads to Gemini 3.5 Flash, they would save over $1 billion annually. That figure is an internal calculation, but it reflects the order-of-magnitude cost gap that now exists between frontier-class and mid-tier-class models — and mid-tier-class models in May 2026 outperform frontier-class models from eighteen months ago.
What This Means If You’re Building at Scale
The cost baseline is shifting faster than most roadmaps account for. A pricing assumption baked into a product in January 2026 may already be wrong. The Gemini 3.5 Flash launch cut costs 25% while improving benchmark performance; the direction of this trend is not ambiguous.
The trillion-token customer club is growing. 375 organizations each processed a trillion or more tokens with Google Cloud in the past year. This group will expand significantly over the next 12 months. If your application is approaching that tier, you have leverage for enterprise pricing conversations that did not exist two years ago — and you should be thinking about multi-cloud token routing to maintain negotiating position.
Agentic workload economics are different from chat economics. The standard “tokens per user per day” metric that made sense for chat interfaces does not port to agent pipelines. Capacity planning for agent workloads requires estimating task-level token consumption — context window utilization per agent invocation, loop depth, tool call overhead — not per-user averages. The builders running trillion-token organizations are almost certainly doing this per-pipeline, not per-seat.
Google’s scale commitment is a signal about infrastructure stability. $180–190 billion in annual capex, TPU 8th generation, 3.2 quadrillion tokens of monthly throughput — these numbers indicate that Gemini’s infrastructure is not going to experience the capacity constraints that plagued early GPT-4 deployments. For workloads where predictable latency and availability matter more than lowest-possible cost, this matters as much as pricing.
The Uncomfortable Reading
Google’s 3.2 quadrillion tokens figure is also a proxy for competitive moat. The company is not just processing tokens — it is accumulating usage patterns, feedback loops, and implicit preference data at a scale that no competitor processes from a single product surface.
OpenAI does not publish equivalent aggregate figures. Anthropic’s revenue trajectory ($4.8B Q1 2026 → projected $10.9B Q2 2026) suggests significant volume, but their token throughput is not disclosed. The competitive picture is not “Google processes the most tokens.” It is “Google is the only company that has disclosed a platform-level figure in the quadrillions, at the moment when that scale begins to produce qualitatively different infrastructure economics.”
Whether that matters depends on what you’re building. For most developers, the practical implication is simpler: the token economy is real, it is large, and the pricing direction is down. Build accordingly.
Coverage note: This builders-log focuses on the economic and infrastructure implications of Google I/O 2026’s token scale announcement. For the technical architecture Google revealed at I/O — Gemini 3.5 Flash, Antigravity 2.0, ADK 2.0, Managed Agents API, Gemini Spark, WebMCP — see our Google I/O 2026 system analysis.