OpenAI Launches GPT-5.4 Mini and Nano With 54% on SWE-Bench Pro

Smaller models, fewer tradeoffs

OpenAI has released GPT-5.4 mini and nano, its latest efficient alternatives to the flagship GPT-5.4 model. The new models launched on March 17, 2026, and are already generating buzz in the developer community for delivering near-flagship performance at a fraction of the cost.

The mini variant achieves 54.4% on SWE-Bench Pro, while the nano model scores 52.4%. That's a significant jump from the previous GPT-5 mini's 45.7%, putting these smaller models within striking distance of the full GPT-5.4 (57.7%).

Pricing and availability

GPT-5.4 mini is available via API at $0.75 per million input tokens and $4.50 per million output tokens. The nano model is even cheaper, positioned as the lightest-weight option for high-volume workloads.

Both models are live in OpenAI's API, Codex, and ChatGPT. GPT-5.4 mini is available to free and Go tier ChatGPT users via Thinking or fallback, while nano is API-only.

Performance and capabilities

The GPT-5.4 mini runs more than 2× faster than its predecessor, achieving approximately 180-190 tokens per second. It features a 400,000-token context window in the API, making it suitable for large codebase analysis.

On coding benchmarks specifically:

OSWorld-Verified: Mini scores 72.1% vs full GPT-5.4's 75.0%
SWE-Bench Pro: Mini at 54.4%, Nano at 52.4%

The mini model is optimized for coding, computer use, tool calling, and real-time image reasoning. The nano variant is built strictly for text and simple tools—ideal for classification, data extraction, ranking, and coding subagents.

What this means for developers

With these releases, OpenAI is clearly targeting the high-volume, latency-sensitive workload market. The near-parity with the flagship model at roughly half the cost makes the mini variant a compelling choice for production applications that don't require the absolute maximum capability.

The 2× speed improvement over the previous generation also addresses one of the key complaints about smaller models—inference latency. For teams building real-time applications or agentic workflows, these tradeoffs are increasingly favorable.