AI News

LLM Updates News

Large language model releases and capability changes.

LLM UpdatesApr 3, 2026

Google DeepMind Releases Gemma 4 With Four Model Sizes and Apache 2.0 License

Google DeepMind releases Gemma 4 with four model sizes, multimodal capabilities, and Apache 2.0 licensing. The 31B variant ranks #3 on Arena AI at 1452 Elo.

Source: HuggingFace BlogRead more →

LLM UpdatesApr 2, 2026

Qwen3.6 Plus Brings 1 Million Token Context and Major Reasoning Upgrades

Alibaba releases Qwen3.6 Plus with 1M token context and major efficiency upgrades, available free on OpenRouter.

Source: News from OpenRouterRead more →

LLM UpdatesApr 1, 2026

Anthropic's Claude Code Leaks Source Code for the Second Time in a Year

Anthropic's Claude Code CLI leaked ~512,000 lines of source code via npm—its second such incident in a year, exposing unreleased features like KAIROS autonomous agent mode.

Source: The Hacker News / Multiple sourcesRead more →

LLM UpdatesMar 31, 2026

llama.cpp hits 100k GitHub stars, marking local AI milestone

llama.cpp hit 100k GitHub stars in March 2026, marking a milestone for local AI. The C++ library enables running LLMs offline without API costs.

Source: Reddit (r/LocalLLaMA)Read more →

LLM UpdatesMar 30, 2026

Google's ICLR Paper Faces Plagiarism Allegations From ETH Zurich Researchers

ETH Zurich researchers accuse Google's ICLR 2026 paper TurboQuant of mischaracterizing their RaBitQ work and using unfair CPU vs GPU benchmarks. The controversy exposes tensions around attribution in AI compression research.

Source: LocalLLaMA / MachineLearning subredditsRead more →

LLM UpdatesMar 28, 2026

Google TurboQuant Achieves 6x LLM Memory Reduction Without Quality Loss

Google Research's TurboQuant algorithm compresses LLM memory by 6x while delivering 8x speedup with zero accuracy loss, targeting the KV cache bottleneck in long-context inference.

Source: TechCrunchRead more →

LLM UpdatesMar 27, 2026

Mistral's Voxtral TTS Beats ElevenLabs in Human Preference Tests

Mistral AI releases Voxtral TTS, a 3B-parameter open-source text-to-speech model that beats ElevenLabs Flash v2.5 in human preference tests with a 68.4% win rate.

Source: TechCrunchRead more →

LLM UpdatesMar 26, 2026

Google's TurboQuant Cuts LLM Memory Use by 6x With No Accuracy Loss

Google Research announces TurboQuant, a memory compression algorithm that reduces LLM KV cache size by 6x with zero accuracy loss and up to 8x speedup on H100 GPUs.

Source: Google Research / TechCrunchRead more →

LLM UpdatesMar 24, 2026

US Government Advisory Body Warns China's Open-Source AI Models Now Outpace US

USCC warns China's open-source AI models now lead global usage, surpassing Meta's Llama in downloads and powering 80% of US AI startups.

Source: ReutersRead more →

LLM UpdatesMar 23, 2026

OpenAI Launches GPT-5.4 Mini and Nano With 54% on SWE-Bench Pro

OpenAI releases GPT-5.4 mini and nano with 54.4% on SWE-Bench Pro, nearly matching flagship performance at half the cost.

Source: OpenAI BlogRead more →

LLM UpdatesMar 22, 2026

NVIDIA Releases Nemotron Cascade 2 30B, Outperforms Qwen3.5 on Coding and Math Benchmarks

NVIDIA releases Nemotron-Cascade-2-30B-A3B, an open-weight 30B MoE model outperforming Qwen3.5 on coding and math benchmarks with only 3B activated parameters.

Source: MarkTechPost / NVIDIA ResearchRead more →

LLM UpdatesMar 16, 2026

Alibaba's Qwen3.5 Matches GPT-5 and Claude Opus on Benchmarks

Alibaba Cloud's Qwen3.5-397B-A17B uses sparse MoE to achieve 8.6x-19x faster inference while matching GPT-5.2 and Claude Opus on benchmarks.

Source: Alibaba Cloud / Qwen BlogRead more →

LLM UpdatesMar 15, 2026

IBM Releases Granite 4.0 1B Speech, a Compact Model for Edge Deployment

IBM releases Granite 4.0 1B Speech, a compact multilingual ASR model that ranks #1 on OpenASR leaderboard while running on edge devices.

Source: Hugging Face BlogRead more →

LLM UpdatesMar 14, 2026

Meta Delays Avocado AI Model Launch to May After Benchmark Disappointment

Meta delays Avocado AI model to May 2026 after internal benchmarks showed it couldn't match Google Gemini 3.0, OpenAI, and Anthropic models.

Source: The New York Times / ReutersRead more →

LLM UpdatesMar 13, 2026

Tesslate's OmniCoder-9B shows small models can excel at coding agents

Tesslate's OmniCoder-9B achieves 61% improvement over Qwen3.5-9B on coding benchmarks while running at 40tps on consumer hardware with 8GB VRAM.

Source: Reddit r/LocalLLaMARead more →

LLM UpdatesMar 12, 2026

llama.cpp finally adds real reasoning budget control for hybrid models

llama.cpp's new sampler mechanism gives real control over reasoning tokens in hybrid models like Qwen3 and DeepSeek-R1, but benchmarking shows poorly-tuned budgets can drop HumanEval scores from 94% to 78%.

Source: LocalLLaMA (Reddit)Read more →

LLM UpdatesMar 11, 2026

GPT-5.1 Retired Today as OpenAI Pushes Forward with GPT-5.4

OpenAI retires GPT-5.1 from ChatGPT today, angering users who prefer its reasoning over GPT-5.4. The new model features 33% fewer errors and 1M token context.

Source: OpenAI Blog / RedditRead more →

LLM UpdatesMar 10, 2026

OpenAI Ships GPT-5.4 With 1M Token Context and New Reasoning Variants

OpenAI releases GPT-5.4 with 1M token context, three variants, and benchmark gains including 75% on OSWorld (beating human performance).

Source: OpenAI BlogRead more →

LLM UpdatesMar 9, 2026

OpenAI Releases GPT-5.4 With Steerable Reasoning and 1M Token Context

OpenAI releases GPT-5.4 with three variants, 1M token context, steerable reasoning, and 67.3% WebArena score. Reasoning models struggle to hide chain-of-thought—a safety feature.

Source: OpenAI BlogRead more →

LLM UpdatesMar 8, 2026

Alibaba's Qwen 3.5 27B Challenges GPT-5 on Performance and Price

Alibaba's Qwen 3.5 27B posts competitive benchmarks against GPT-5 while running locally at 90 tokens per second, challenging closed models with open weights.

Source: LocalLLaMA / Rival.tipsRead more →

LLM UpdatesMar 6, 2026

OpenAI Launches GPT-5.4 With Computer Agent Capabilities, Beats Human Baseline on OSWorld

OpenAI released GPT-5.4 with 75% on OSWorld-Verified, beating human baseline of 72.4%. The model features 1M token context and native computer use capabilities.

Source: OpenAI BlogRead more →