Google's ICLR Paper Faces Plagiarism Allegations From ETH Zurich Researchers
Google Research's TurboQuant, a paper accepted to ICLR 2026 promising to reduce LLM memory usage by 6x, is now embroiled in a controversy over attribution and fair benchmarking. Researchers from ETH Zurich are publicly accusing the Google team of mischaracterizing their prior work and presenting misleading performance comparisons.
TurboQuant, unveiled on March 24-25, 2026, is a training-free vector quantization algorithm designed to compress the Key-Value cache in large language models. The technique claims to achieve 6x memory reduction and up to 8x speedup on NVIDIA H100 GPUs with zero measurable accuracy loss across long-context benchmarks.
The Core Dispute
Jianyang Gao, a postdoctoral researcher at ETH Zurich and first author of the RaBitQ papers, has challenged TurboQuant's characterization of prior work. RaBitQ (Randomized Binary Quantization) is a quantization method developed at ETH Zurich that uses shared random rotation mechanisms to achieve asymptotically optimal compression error bounds.
"We are posting this comment to create a public record because the public discussion and promotion of TurboQuant have already created substantial confusion about its relationship to our RaBitQ line of work," Gao wrote in a Reddit post.
The TurboQuant paper describes RaBitQ as a "grid-based PQ method" with "suboptimal" theoretical guarantees due to "loose analysis," while omitting RaBitQ's shared random rotation mechanism. However, Gao's team argues their extended work rigorously proved compression error reaches the mathematically optimal bound, published at a top theoretical computer science venue.
Unfair Benchmarks?
Beyond attribution issues, the ETH Zurich team is questioning TurboQuant's experimental methodology. According to Gao, the TurboQuant paper's speed comparisons used problematic approaches:
- RaBitQ was tested using a Python translation created by the TurboQuant team rather than RaBitQ's official optimized C++ code
- RaBitQ ran on single-core CPU with multithreading disabled, while TurboQuant used NVIDIA A100 GPU
- These conditions created misleadingly large performance gaps without adequate disclosure
Emails from March 2025 show that Majid Daliri from the TurboQuant team contacted RaBitQ's authors requesting debugging assistance for a Python version based on RaBitQ's source code, indicating detailed familiarity with the prior method.
What TurboQuant Actually Delivers
Despite the controversy, TurboQuant does present genuine technical contributions. The algorithm combines two techniques: PolarQuant (presented at AISTATS 2026) achieves MSE-optimal quantization by exploiting high-dimensional space geometry, while Quantized Johnson-Lindenstrauss (QJL) adds a 1-bit transform on residuals for unbiased inner product estimates.
Benchmarks show 100% retrieval accuracy on Needle-In-A-Haystack up to 104k tokens at 4x compression, with full-precision match on Llama-3.1-8B-Instruct and Ministral-7B-Instruct. Google plans to release an official implementation in Q2 2026.
The controversy highlights growing tensions around academic attribution in AI research, particularly as quantization techniques become increasingly important for making LLM inference more efficient.