At a glance: Cohere Command A+, released May 20, 2026. 218B sparse MoE, 25B active parameters per token. Apache 2.0 license. Native citation generation. $2.50/$10.00 per million tokens on API. Artificial Analysis Intelligence Index: 37. Knowledge cutoff April 1, 2025. Part of our AI Models & Companies reviews.
Cohere’s most important announcement in Command A+ has nothing to do with a benchmark.
Cohere’s prior model, Command R+, shipped under CC-BY-NC 4.0 — meaning commercial deployment required an enterprise license from Cohere. You could evaluate it, research with it, and integrate it in non-commercial contexts, but the moment you shipped a product, you needed a licensing conversation. Command A+ drops that requirement entirely. Apache 2.0. Full commercial use. No revenue caps. No non-compete clauses. Take the weights, fine-tune on your classified data, deploy on an air-gapped server, ship a product. Cohere gets nothing from you except attribution in the license file.
That licensing shift, paired with a 218B sparse MoE architecture that runs on two H100 GPUs, makes Command A+ the most practically deployable open-weight enterprise model in the market. Whether that combination justifies a $2.50/$10.00 API price tag — against alternatives that are cheaper or more capable on raw benchmarks — depends on what you’re building.
Release Context
Command A+ was announced May 20, 2026 in a blog post titled “Introducing Command A+: Our Most Open, Capable, and Efficient Model Yet.”
The release is a substantial departure from Cohere’s prior release cadence. Command A (the predecessor) was released in March 2025 under CC-BY-NC 4.0 — the same terms as Command R+. Command A Reasoning, a thinking-mode variant, arrived later under similar restrictions. Command A+ collapses both into a single model with a fundamentally different license.
Predecessor comparison:
| Model | License | Architecture | Multimodal |
|---|---|---|---|
| Command R+ | CC-BY-NC 4.0 | Dense | No |
| Command A | CC-BY-NC 4.0 | MoE | No |
| Command A Reasoning | CC-BY-NC 4.0 | MoE | No |
| Command A+ | Apache 2.0 | Sparse MoE | Yes |
The licensing transition is the headline. The multimodal addition — Cohere’s first — is the capability headline. Both matter.
Architecture
Sparse MoE with Selective Quantization
Command A+ uses a sparse Mixture-of-Experts decoder with 218 billion total parameters organized across 128 experts. Each token activates 8 experts plus 1 shared expert — meaning 25 billion parameters are active per inference call, not 218 billion. This is the same architectural efficiency principle as Mistral Small 4 (6B of 119B active) or DeepSeek V4 (49B of 1.6T active): total parameter count overstates the compute cost of any single forward pass.
The genuinely innovative piece in Command A+ is Quantization-Aware Distillation (QAD): rather than applying uniform quantization across the model, QAD preserves attention pathway weights at full precision while quantizing only the MoE expert layers. The result is near-lossless W4A4 quantization — Cohere claims the compressed model retains quality while running on significantly less hardware. Independent verification of “near-lossless” claims will take time to accumulate from the research community, but the hardware footprint reduction is concrete.
W4A4 performance numbers (Cohere-reported):
- Output throughput: 375 tokens/second
- Time-to-First-Token: 113ms
- Hardware: 2×H100 or 1×Blackwell B200
For comparison: Command A Reasoning ran at ~237 tokens/second on the same inference setup. The QAD-compressed Command A+ is 63% faster than its predecessor while fitting on fewer GPUs.
Specifications
| Specification | Value |
|---|---|
| Total parameters | 218B |
| Active parameters per token | 25B (8 of 128 experts + 1 shared) |
| Context window | 128,000 tokens input / 64,000 output |
| Modalities | Text + image input; text output |
| Languages | 48 (up from 23 in Command A) |
| Architecture | Sparse MoE decoder |
| Min. hardware (W4A4) | 2×H100 80GB or 1×NVIDIA B200 |
| Knowledge cutoff | April 1, 2025 |
The Apache 2.0 Licensing Story
The license is what builders should read carefully before anything else.
Apache 2.0 means:
- Commercial use with no restrictions
- Modification and redistribution permitted
- No revenue caps (unlike Mistral’s $20M/month Modified MIT)
- No non-compete clauses (unlike some Meta Llama licenses)
- Fine-tuning and derivative models permitted under same terms
- Air-gapped deployment — weights, quantization files, and all artifacts on HuggingFace are fully downloadable
What this unlocks for enterprise:
The practical unlock is in regulated industries. A hospital system building clinical decision support can fine-tune Command A+ on proprietary patient data without those data ever leaving their infrastructure. A financial institution can deploy Command A+ on hardware that has never touched the public internet. A defense contractor can run Command A+ on a classified network. None of these use cases were legally clean with CC-BY-NC 4.0 without a separate enterprise agreement.
Comparison to the competition:
| Model | License | Commercial use |
|---|---|---|
| Command A+ | Apache 2.0 | Unrestricted |
| Mistral Large 3 | Apache 2.0 | Unrestricted |
| Mistral Medium 3.5 | Modified MIT | Revenue cap ($20M/month) |
| Meta Llama 4 Scout/Maverick | Llama Community | MAU restrictions |
| Gemma 4 | Gemma Terms of Service | Some restrictions |
| DeepSeek V4 | MIT | Unrestricted |
Command A+ and Mistral Large 3 share the most open licensing profile among frontier-adjacent models. What differentiates them is Cohere’s enterprise feature set: primarily the native citation system.
Native Citations
This is the genuinely unique feature in Command A+, and it’s worth explaining in detail.
Most RAG pipelines handle citation attribution as a post-processing step: the model generates an answer, then a separate component matches phrases in the answer to source chunks. This is fragile — the phrase matching can fail, the model can generate plausible-sounding content that doesn’t map cleanly to a source, and the citation layer adds latency and infrastructure complexity.
Command A+ generates citations during inference using <co> / </co> tags that wrap factual claims directly in the output. Each tag links to a source document or database row by reference. There is no post-processing step — the model is trained to attribute as it generates.
Practical implications:
- Every factual claim in a response carries a machine-readable citation to its source
- Suitable for direct use in legal documents, clinical notes, financial reports — contexts where unsourced AI-generated claims are a liability
- Reduces the engineering complexity of building citation-aware RAG pipelines
- Cohere reports 20% improvement in “Agentic Question Answering accuracy” over Command A Reasoning, partly attributed to multimodal document understanding feeding into more accurate citations
For teams building in regulated industries, this is a meaningfully different capability from what other open-weight models provide. No other major open-weight model ships with architecture-level citation generation trained into the model itself.
Benchmarks
Agentic Performance: The Highlight Numbers
| Benchmark | Command A | Command A+ | Change |
|---|---|---|---|
| τ²-Bench Telecom | 37% | 85% | +48 points |
| τ²-Bench Hard | 3% | 25% | +22 points |
| AIME 25 (math) | 57% | 90% | +33 points |
| Agentic QA Accuracy | baseline | +20% | — |
| Spreadsheet Analysis | baseline | +32% | — |
These are generation-over-generation improvements, not absolute comparisons to frontier models. The τ²-Bench Telecom jump from 37% to 85% is the headline: Cohere’s own agentic benchmark shows a substantial capability increase for sustained multi-step tool use.
Broader Benchmarks
| Benchmark | Command A+ | Context |
|---|---|---|
| MMMU Pro | 63% | Multimodal reasoning |
| MMMU | 75.1% | Multimodal understanding |
| MathVista | 80.6% | Math + visual |
| CharXiv Reasoning | 52.7% | Chart reasoning |
| HLE | ~11% | Scientific reasoning |
| GPQA Diamond | 76% | Graduate-level science |
| AI Intelligence Index (Artificial Analysis) | 37 | Composite |
What the Intelligence Index Tells You
The Artificial Analysis Intelligence Index composite score of 37 is the most important third-party number for builders benchmarking Command A+ against alternatives. For context:
| Model | Intelligence Index |
|---|---|
| GPT-5.5 | 60 |
| Claude Opus 4.7 | 57 |
| Gemini 3.1 Pro | 57 |
| Mistral Medium 3.5 | 39 |
| Command A+ | 37 |
| DeepSeek V4 | — |
Command A+ scores just below Mistral Medium 3.5 on the composite index, and meaningfully below the frontier tier. For tasks that require deep reasoning, graduate-level science (11% on HLE), or coding (Cohere explicitly recommends GPT-5.5 or Claude 4.7 instead), Command A+ is not the best available option.
The benchmark profile is consistent with a model optimized for enterprise tool use and RAG workflows rather than raw reasoning capability. That’s a legitimate product positioning — just not a frontier-tier general-purpose claim.
Pricing
Cohere API:
- Input: $2.50 per million tokens
- Output: $10.00 per million tokens
| Model | Input ($/M) | Output ($/M) | License |
|---|---|---|---|
| Command A+ | $2.50 | $10.00 | Apache 2.0 |
| Command R+ (predecessor) | $0.93 | $1.86 | CC-BY-NC |
| GPT-5.5 | $5.00 | $20.00 | Closed |
| Claude Opus 4.7 | $25.00 | $125.00 | Closed |
| Mistral Medium 3.5 | $1.50 | $7.50 | Modified MIT |
| DeepSeek V4 | $0.27 | $1.10 | MIT |
The Cohere API pricing positions Command A+ above Mistral Medium 3.5 on output cost and far above DeepSeek V4. The justification is the citation and enterprise feature set, not raw capability.
For teams self-hosting the Apache 2.0 weights: inference cost is whatever your hardware costs. At W4A4 on 2×H100, the marginal cost of 10 million output tokens is compute amortization — not $100 in API fees. Self-hosting is where the Apache 2.0 license and efficient quantization combination delivers the most value.
Self-Hosting
The “runs on 2×H100” headline is accurate for the W4A4 quantized version. Practical details:
- BF16 full precision: ~8×H100 80GB (full precision MoE inference requires more memory due to all expert weights being loaded)
- FP8: ~4×H100 80GB
- W4A4 (QAD quantized): 2×H100 80GB or 1×NVIDIA B200 — the deployment target for the headline claim
At W4A4, Cohere reports 375 tokens/second output throughput and 113ms time-to-first-token. For most agentic applications (tool use, RAG, document processing), 375 t/s is more than adequate.
Available on HuggingFace (CohereLabs):
- BF16 weights:
CohereLabs/command-a-plus-05-2026 - FP8:
CohereLabs/command-a-plus-05-2026-fp8 - W4A4: quantized via QAD (Cohere’s tooling)
vLLM and standard Transformers inference are supported from day one. Azure AI Foundry integration confirmed at launch.
Multimodal
Command A+ is Cohere’s first model to accept image input. The vision capability is focused on document and spreadsheet processing — the 32% improvement in spreadsheet analysis and 20% improvement in agentic QA accuracy are the multimodal capability claims Cohere leads with.
This is not a general image understanding model in the style of GPT-4o or Gemini 3.1 Pro. The MMMU scores (75.1%) are competitive but not leading. The practical use case is: upload a PDF, a chart, or a spreadsheet alongside a text query, and Command A+ will reason across both. For enterprise document workflows, that’s the capability that matters. For general multimodal tasks, frontier alternatives are more capable.
Comparison to the Cohere Ecosystem
vs. Command R+ (the predecessor): Command A+ is faster, multimodal, more capable on agentic benchmarks, more permissively licensed, and supports 48 vs. 23 languages. Command R+ should be retired for new projects.
vs. Command A Reasoning: Command A+ replaces Command A Reasoning as the flagship. The capability improvements (τ²-Bench +48 points, AIME +33 points) are substantial. The license upgrade (Apache 2.0 vs. CC-BY-NC) is the procurement argument.
The self-hosting thesis: Cohere’s bet with Command A+ is that enterprise builders who need data sovereignty and citation-quality RAG will self-host rather than use the API. Apache 2.0 + W4A4 efficiency + native citations + 48 languages is a package designed for that buyer. The API exists for teams that want managed inference without the hardware commitment.
Who Should Use Command A+
Strong fit:
- Regulated industry RAG pipelines (legal, finance, healthcare) that need machine-readable citations on factual claims
- Sovereign AI deployments requiring air-gapped or on-premise infrastructure with no vendor agreement
- Multilingual enterprise applications (48 languages, including several not well-served by US-headquartered models)
- Teams whose primary bottleneck is licensing flexibility, not raw reasoning capability
Weak fit:
- Coding-heavy applications — Cohere explicitly recommends GPT-5.5 or Claude 4.7 instead
- Graduate-level scientific reasoning — HLE at 11% is a significant gap vs. frontier models
- Latency-sensitive consumer applications — 113ms TTFT is competitive but not leading
- Knowledge cutoff sensitive applications — April 2025 is 13 months old at launch; if recency matters, check alternatives
Rating: 3.5/5
What earns the 3.5:
The Apache 2.0 licensing is a genuine unlock — particularly for regulated industries and government deployments where CC-BY-NC required a separate agreement. Native citations are architecturally unique in the open-weight space: no other comparable model ships citation generation as a trained capability rather than a post-processing layer. The QAD quantization achieving 375 t/s on 2×H100 is a real infrastructure story, not a marketing claim. τ²-Bench Telecom at 85% (from 37%) represents a substantial agentic capability improvement for multi-step tool use.
What limits the score:
The AI Intelligence Index of 37 places Command A+ in the capable-but-not-frontier tier — the same band as Mistral Medium 3.5 (39) but well below the 57-60 cluster where GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro sit. The knowledge cutoff of April 1, 2025 was already 13 months old at launch — a meaningful gap for applications requiring current information. API pricing at $2.50/$10.00 is more expensive than Mistral Medium 3.5 ($1.50/$7.50) despite lower benchmark scores. And for the most common reason builders reach for open-weight models — cost reduction on coding and reasoning tasks — Cohere explicitly concedes the ground to competitors.
Command A+ is the right model for a specific buyer profile: enterprise teams who need data sovereignty, machine-readable citations, and Apache 2.0 licensing for regulated industry deployments. For everyone else, the capability-to-cost profile relative to DeepSeek V4, Mistral alternatives, or the Qwen 3.x family requires more justification.
Quick Reference
| Company | Cohere (Toronto, Canada) |
| Released | May 20, 2026 |
| Parameters | 218B total / 25B active per token |
| Architecture | Sparse MoE (128 experts, 8+1 active) |
| Context | 128K input / 64K output |
| Modalities | Text + image input; text output |
| Languages | 48 |
| τ²-Bench Telecom | 85% |
| AIME 25 | 90% |
| MMMU | 75.1% |
| Intelligence Index | 37 (Artificial Analysis composite) |
| API pricing | $2.50 input / $10.00 output per M tokens |
| License | Apache 2.0 (no restrictions) |
| HuggingFace | CohereLabs/command-a-plus-05-2026 |
| Min. hardware | 2×H100 80GB (W4A4) or 1×B200 |
| Knowledge cutoff | April 1, 2025 |
| Rating | 3.5 / 5 |
Related ChatForest reviews: Cohere Enterprise Platform · Mistral Medium 3.5 · Mistral Large 3 · DeepSeek V4 · Mistral Small 4