At a glance: NVIDIA Cosmos 3, released June 1, 2026 at GTC Taipei, is the first fully open omnimodel for physical AI. It generates text, images, video, ambient sound, and robot action sequences in a single unified model — built on a novel Mixture-of-Transformers (MoT) architecture. Available free on HuggingFace under the OpenMDW-1.1 license, it tops three major physical AI benchmarks and reduces robotics training cycles from months to days.
Rating: 4.5/5 — Best-in-class for physical AI development; open weights make it especially compelling. Niche audience (robotics/AV), and hardware requirements are substantial.
What Cosmos 3 Is
Most world models generate video. Cosmos 3 generates actions.
NVIDIA’s Cosmos platform has existed since Cosmos 1, but Cosmos 3 marks a qualitative leap: it is the first omnimodel to natively understand and produce all of: text, images, video, ambient sound, and robot action sequences (joint angles, gripper positions, motion trajectories). The prior generation required separate components for reasoning vs. generation. Cosmos 3 puts them in a single model.
The practical application: a robot developer can give Cosmos 3 a task description, a scene image, and target objects. The model generates a physically plausible simulation of the robot completing the task — and exports the action trajectory data needed to train the robot’s policy model. Simulation-to-real training cycles that previously took months are now reported to take days.
Architecture: Mixture-of-Transformers
Cosmos 3 introduces a Mixture-of-Transformers (MoT) architecture — a two-tower design:
- Autoregressive transformer for discrete token generation (reasoning, text, actions)
- Diffusion transformer for continuous multimodal generation (images, video, sound)
The two towers are not separate models that are chained together — they are trained jointly and share state. This allows the model to reason about physics (e.g., “if this arm swings forward, the object will slide to the left”) and then generate a video that accurately reflects that outcome, alongside the corresponding joint angles for a real robot to execute.
This is architecturally distinct from the “chain a reasoning LLM with a video generation model” approach that characterizes most physical AI pipelines today. Whether the joint training provides meaningful gains over careful chaining in practice is something independent benchmarks are beginning to test, but NVIDIA’s internal results are strong.
Model Variants
Cosmos 3 ships in three tiers:
| Variant | Target Use Case | Speed vs. Quality |
|---|---|---|
| Cosmos 3 Super | Post-training for robotics and autonomous vehicle models | Highest physics accuracy; slower inference |
| Cosmos 3 Nano | Real-time reasoning at inference latency | Fast; lower fidelity |
| Cosmos 3 Edge | On-device inference (coming soon) | TBD |
Most builders working on data generation pipelines will use Cosmos 3 Super. Teams deploying on robots or edge compute will use Cosmos 3 Nano now and Cosmos 3 Edge when it ships.
Specialized checkpoints are also available — including Cosmos3-Nano-Policy-DROID, a fine-tuned variant for the DROID robot dataset, and Cosmos3-Super-Text2Image for image generation tasks.
Benchmark Performance
Cosmos 3 tops three physical AI-relevant leaderboards:
- Physics-IQ: Measures physical understanding in generated video (do objects fall correctly? Do materials behave as expected?)
- R-Bench: Robotics-specific evaluation
- PAI-Bench: Physical AI general benchmark
Among open models, it also leads TAR leaderboards for vision understanding.
Caveat standard to all NVIDIA-reported benchmarks: these are first-party results. Third-party replication is in progress. The physical AI benchmark ecosystem is newer and less standardized than software coding benchmarks, so cross-lab comparison is still evolving.
Licensing and Access
Cosmos 3 is released under OpenMDW-1.1 — NVIDIA’s open model license that allows research, commercial use, and fine-tuning with attribution and a few use-case restrictions (no weapons, no surveillance, standard responsible AI carve-outs).
Models are available now:
- HuggingFace:
nvidia/Cosmos3-Super,nvidia/Cosmos3-Nano, and fine-tuned variants - GitHub: github.com/NVIDIA/cosmos — runnable inference, training, and evaluation workflows
There is no API pricing because there is no hosted API. You download the weights and run them yourself. That means zero per-token cost — and significant hardware cost instead.
Hardware Reality Check
NVIDIA does not publish minimum hardware specs in the launch materials, but given the architecture and scale, practical expectations:
- Cosmos 3 Nano: Likely runnable on a single high-end GPU (A100/H100 class)
- Cosmos 3 Super: Almost certainly requires multi-GPU or high-memory GPU (H100 80GB+)
For most builders, cloud GPU instances (Lambda, RunPod, or NVIDIA’s own DGX Cloud) are the practical path unless you have on-prem hardware. The open-weight model means you pay for compute, not per-request fees — which can be economical at scale for data generation pipelines.
Who This Is For
Cosmos 3 is specifically built for:
- Robotics engineers building manipulation or locomotion policies via sim-to-real transfer
- Autonomous vehicle teams generating synthetic training scenes with accurate physics
- Physical AI researchers evaluating world models and developing policy networks
- Simulation engineers at gaming or VFX studios who need physically accurate scene generation
This is not a general-purpose language model, multimodal assistant, or code generation tool. If you’re not working on physical AI — robots, drones, vehicles, or physically grounded simulation — Cosmos 3 is not your tool. But if you are, it is currently the strongest open-weight option available by a meaningful margin.
NVIDIA Cosmos Coalition
NVIDIA launched the Cosmos Coalition alongside Cosmos 3, bringing in partners to validate and extend the platform:
- Agile Robots — humanoid robotics
- Black Forest Labs — image generation infrastructure
- Generalist — general-purpose robotics
- LTX — video generation
- Runway — media/VFX workflows
- Skild AI — robot learning
The coalition is important signal: it means Cosmos 3 is being stress-tested in production robotics pipelines by teams building real products, not just evaluated in academic settings.
Builder Verdict
If you work in physical AI, Cosmos 3 is the clearest open-weight step change since the original Cosmos release. The MoT architecture’s joint training of reasoning and generation is genuinely novel. Free weights under a permissive license means there’s minimal cost to experiment.
The gaps to watch: hardware requirements remain high (multi-GPU for Super), Cosmos 3 Edge isn’t shipping yet, and first-party benchmarks need third-party validation to fully trust.
For a software-focused builder who just wants an LLM or coding assistant, Cosmos 3 is infrastructure — relevant only if you’re building pipelines for robots or vehicles.
Rating: 4.5/5. Best open physical AI foundation model available. Narrow audience, high hardware bar, but for that audience it’s a genuine step-change tool.
ChatForest researches AI tools and models. We do not have hands-on access to Cosmos 3 infrastructure — this review is based on NVIDIA’s published technical materials, HuggingFace model cards, and third-party reporting from HPC Wire, MarkTechPost, and the NVIDIA Developer Blog.