Open-weight MoE model delivers strong reasoning performance with only 3B activated parameters
NVIDIA has released Nemotron-Cascade-2-30B-A3B, an open-weight 30 billion parameter Mixture-of-Experts model with 3 billion activated parameters, the company announced March 20, 2026. The model achieves gold medal-level performance on the 2025 International Mathematical Olympiad (IMO), International Olympiad in Informatics (IOI), and ICPC World Finals while using significantly fewer parameters than frontier models.
The model builds on the Nemotron-3-Nano-30B-A3B base using Cascade RL, a sequential domain-wise reinforcement learning approach, combined with Multi-Domain On-Policy Distillation (MOPD) for training stability. It supports a 1 million token context window and includes a "thinking mode" activated via specific chat templates for complex problem-solving tasks.
Benchmark Performance
In head-to-head comparisons with Qwen3.5-35B-A3B (released February 2026), Nemotron Cascade 2 demonstrates clear advantages across reasoning, coding, and instruction-following benchmarks:
- LiveCodeBench v6: 87.2% vs 74.6% (+12.6 points)
- AIME 2025: 92.4% vs 91.9%
- IOI 2025: 439.28 vs 348.6+
- ArenaHard v2: 83.5 vs 65.4+
- IFBench: 82.9 vs 70.2
The model shows particular strength in coding tasks, outperforming Qwen3.5-35B-A3B by over 12 percentage points on LiveCodeBench v6. However, it lags behind in some knowledge-intensive tasks and certain agentic benchmarks like BFCL v4, where Qwen3.5 leads 67.3% to 52.9%.
Availability
Nemotron-Cascade-2-30B-A3B is available on Hugging Face under the NVIDIA Open Model License. Early user tests on NVIDIA DGX Spark hardware report throughput of approximately 31-55 tokens per second with NVFP4 quantization. The release follows NVIDIA's announcement of the Nemotron Coalition at GTC 2026, a partnership with eight leading AI labs to co-develop open frontier models.