Jensen Huang keynotes NVIDIA GTC Taipei on Monday, June 1 at 11 a.m. Taipei time (Sunday 11 p.m. EDT, Sunday 8 p.m. PDT). The event runs through June 4 as part of Computex 2026. This article is written the day before the keynote, from confirmed leaks, official NVIDIA disclosures, and OEM product databases.

The reason to read this now rather than after: the announcements will move fast and generate hype on both ends. Knowing what was already confirmed before the stage lights go up helps you evaluate the new pieces without getting swept along.


What Is GTC Taipei and Why Does It Matter This Year

GTC Taipei is NVIDIA’s first dedicated developer conference in Asia, timed to coincide with Computex 2026, where every major chip CEO will be present. Jensen Huang described 2026’s second half as “very, very busy” in a pre-event interview — specifically naming Grace Blackwell, Vera Rubin, and “a surprise new product that we haven’t told anyone about yet.”

For builders, the signal is not the hardware specs. The signal is the inference cost curve. NVIDIA has framed Vera Rubin around one number: 10x lower cost per token than Blackwell. That number — if it holds in production cloud deployments in H2 2026 — changes what models are economically viable to build products on.


The Five-Layer Cake: The Framework Behind Everything Else

Jensen Huang uses the “five-layer cake” to explain how NVIDIA thinks about the full AI infrastructure stack:

  1. Energy — raw power generation. NVIDIA’s position: intelligence generated in real time requires power generated in real time. Power is now the primary constraint on AI scaling, not chips.

  2. Chips — processors that transform energy into computation. NVIDIA’s domain.

  3. Infrastructure — land, power delivery, cooling, networking, the systems that orchestrate tens of thousands of chips into one machine.

  4. Cloud data centers — the distributed computing layer where training and large-scale inference happen.

  5. Models and applications — the layer builders and their users actually interact with.

The practical point for builders: NVIDIA is not selling chips. It is selling a complete claim on the bottom three layers of this stack, which constrains what is possible at the top two. When NVIDIA says inference costs will drop 10x, they mean the infrastructure and chip layers are changing in ways that flow upward. When cloud providers upgrade to Vera Rubin systems in H2 2026, per-token pricing on their APIs should reflect it.

Understanding the five-layer cake helps you predict pricing cycles, not just absorb them.


Vera Rubin NVL72: What “5x Inference” Actually Means

NVIDIA unveiled the Vera Rubin platform in detail earlier this year. GTC Taipei is where H2 deployment timelines and partner commitments will be formalized.

The hardware:

  • 72 Rubin GPUs + 36 Vera CPUs in a single liquid-cooled rack
  • 3.6 EFLOPS of NVFP4 inference throughput at the rack level
  • 260 TB/s all-to-all NVLink scale-up bandwidth
  • HBM4 memory: up to 22 TB/s per GPU, versus 8 TB/s on B200 (a 2.75x jump)
  • Up to 288 GB per GPU

The claims:

  • 5x inference performance versus Blackwell GB200 per rack
  • 10x lower cost per token versus Blackwell
  • For MoE model training, Vera Rubin requires 1/4 the GPU count of Blackwell for the same workload

When it arrives:

Full production is confirmed as of Q1 2026. Cloud availability from AWS, Google Cloud, Microsoft Azure, OCI, and CoreWeave is expected H2 2026.

What this means for builders:

The 10x cost-per-token claim is the one to watch. It does not mean your bill drops 10x on June 2. It means: as cloud providers absorb Vera Rubin capacity across Q3 and Q4 2026, frontier model inference pricing will compress, and model tiers that are currently too expensive for broad production use will become economically viable.

The most direct implication: if you are today using GPT-5.5 Instant or Gemini 3.5 Flash because frontier models are too costly for your volume, the H2 2026 pricing environment may change that calculation. Do not lock in multi-year pricing agreements for inference that do not have adjustment clauses tied to hardware generation transitions.

HBM4’s bandwidth jump is also significant for context-heavy workloads. Long-context inference (1M+ tokens) is constrained primarily by memory bandwidth, not compute. A 2.75x improvement in per-GPU bandwidth means ultra-long-context calls will be faster and cheaper on Vera Rubin systems than on Blackwell — before any algorithmic improvements.


N1X: The First NVIDIA Chip Designed for a Laptop

The N1X is the announcement most covered by consumer tech press and most consistently underframed for builders.

What it is:

  • ARM-based SoC co-developed with MediaTek
  • 20 ARM cores in a hybrid config: 10 Cortex-X925 performance cores + 10 Cortex-A725 efficiency cores, boosting to ~4.0–4.2 GHz
  • 6,144 CUDA cores using Blackwell architecture (equivalent core count to the desktop RTX 5070)
  • Up to 128 GB LPDDR5X unified memory, shared between CPU and GPU
  • TSMC 3nm process

What it is not:

The N1X GPU is not a discrete RTX 5070 in a laptop. Unified memory sharing and lower clock speeds put it at roughly 25% of the discrete RTX 5070’s peak performance. For consumer gaming, that is a significant gap. For AI inference, it is a different story.

When it ships:

Jensen Huang is expected to formally reveal N1X laptops at the June 1 keynote. OEM partners confirmed through leaks include Dell, Lenovo, and Asus. First devices: October 2026. Broad availability: early 2027. Wide availability was reportedly limited in 2026 — expect the October devices to be premium SKUs.

What this means for builders:

128 GB of unified memory on a laptop changes what local inference is possible. Current M4 Max MacBook Pros offer up to 128 GB. The N1X matches that ceiling. Llama 4 Scout (109B parameters) runs in roughly 55-60 GB at 8-bit quantization. A 70B model fits in 128 GB at 4-bit quantization with room for context.

This matters for three categories of builders:

  1. Enterprise software for regulated industries — healthcare, legal, finance. Customer data that cannot leave a device or internal network. The N1X makes locally-run frontier-class models a realistic default for these buyers rather than a niche workaround.

  2. Developer tools — coding assistants that can run fully offline, without API keys or rate limits. The N1X combined with 128 GB unified memory makes a local agentic coding loop viable for the first time on a mainstream laptop form factor.

  3. Consumer apps in low-connectivity markets — 128 GB of unified memory with Blackwell CUDA cores enables on-device experiences that were previously impossible without cloud connectivity.

The caveat: wide availability is 2027. If you are building for N1X as a target platform, you have time. But the architectural direction is now confirmed.


The Mystery Product

Jensen Huang said in a pre-event interview: “The second half of this year is going to be very, very busy with Grace Blackwell, Vera Rubin, and we have a surprise new product that we haven’t told anyone about yet.”

He did not say whether the mystery product is infrastructure, consumer, or something else. Pre-event analysis points toward a consumer-facing AI device, possibly related to NVIDIA’s “Project DIGITS” direction — an AI PC appliance for developers and power users. Given the proximity to the N1X laptop announcement, it may be positioned as the complement: the desktop counterpart to the N1X laptop chip.

Treat this as a watch item. It could be significant for builders or it could be a consumer positioning move. Wait for the actual reveal.


Physical AI: The Builders Who Should Pay Attention

GTC Taipei includes two full days of Physical AI sessions: robotics, industrial AI, simulation, and semiconductor workflows.

Most AI builders writing LLM applications do not need to track this closely. But there is one intersection worth noting: NVIDIA is framing its GPU infrastructure investment as not just for cloud LLM training, but for agentic AI that controls physical systems. The same chips powering your API calls also power industrial robot fleets and autonomous vehicle stacks.

For builders targeting industrial clients, the Physical AI sessions are where NVIDIA will formalize the Jetson Thor roadmap — the edge AI module delivering up to 2,070 FP4 teraflops at the edge. If you build for manufacturing, logistics, or any environment where AI needs to operate in a physical space without reliable cloud connectivity, the Jetson Thor specs are worth tracking.


What to Watch For During the Keynote

Confirm or update:

  • Cloud partner availability dates for Vera Rubin NVL72 (the exact H2 window — Q3 vs Q4 changes the pricing timeline significantly)
  • N1X OEM pricing tiers (premium-only at launch, or any mid-range SKUs)
  • Inference pricing guidance from any cloud partners on stage

Watch for new information:

  • The mystery product reveal
  • Any A2A or MCP-related announcements (NVIDIA has been positioning agents as first-class Vera Rubin customers)
  • Anything about NemoClaw GA or enterprise OpenClaw alternatives (NVIDIA’s enterprise agent framework track)

Ignore for now:

  • Benchmark numbers announced on stage — keynote benchmarks are optimized configurations, not production averages
  • Consumer GPU gaming announcements — irrelevant to builder cost calculations
  • Anything branded “AI PC” that is not the N1X — marketing framing, not an architecture change

The Builder Calculus

Two things from GTC Taipei are structurally important for builders’ H2 2026 plans:

1. Vera Rubin changes the inference cost floor. If cloud providers absorb Vera Rubin capacity across Q3–Q4 as announced, the economic case for using larger, more capable frontier models improves significantly. This should factor into product roadmaps being set now — not as a given, but as a credible downside constraint on conservative cost estimates.

2. N1X normalizes 128 GB unified memory in a laptop. The practical consequence is that local model inference becomes a mainstream hardware capability, not a power-user edge case, by 2027. Builders who design now with the assumption that users can only run small on-device models are designing for a constraint that expires.

Both of these are structural changes to the infrastructure layer, not quarterly product cycles. They do not require action today. They require updating the assumptions in long-horizon product roadmaps.

The keynote is live June 1 at 11 a.m. Taipei time. A replay will be available on NVIDIA’s website.