AI Haven
AI News

Tesslate's OmniCoder-9B shows small models can excel at coding agents

Tesslate's OmniCoder-9B achieves 61% improvement over Qwen3.5-9B on coding benchmarks while running at 40tps on consumer hardware with 8GB VRAM.

March 13, 2026

Startup releases 9B coding agent model that outperforms expectations on consumer hardware

Tesslate, an AI startup founded in early 2025, has released OmniCoder-9B, a specialized coding agent model that is turning heads in the local LLM community. Fine-tuned on over 425,000 agentic coding trajectories from Claude Opus 4.6, GPT-5.4, and Codex reasoning traces, the 9-billion parameter model demonstrates strong agentic behaviors despite its relatively small size.

The model is built on Qwen3.5-9B's hybrid architecture featuring Gated Delta Networks interleaved with standard attention. According to results posted on the model's Hugging Face page, OmniCoder-9B achieves a 23.6% pass rate on Terminal-Bench 2.0—a 61% improvement over the base Qwen3.5-9B model which scored 14.6%.

What makes this release notable is its accessibility. Users on Reddit report running the model at roughly 40 tokens per second using Q4_K_M quantization on consumer hardware with just 8GB of VRAM. The model supports up to 100K context length and exhibits behaviors like error recovery, responding to LSP diagnostics, and using proper edit diffs instead of full file rewrites.

"I was getting something useful with OpenClaw but OmniCoder-9B just completes my test tasks flawlessly and it was fast as fuck," one Reddit user wrote in a post that received over 100 upvotes.

Tesslate, which originated as a viral open-source UI model on Reddit and Hugging Face, has grown to over 60,000 model downloads. The company positions itself as an "AI-native development OS" focused on full-stack app generation from natural language prompts.

OmniCoder-9B is available in full precision on Hugging Face (~17.9 GB BF16) along with GGUF quantizations for running locally with llama.cpp. Community members have also created 4-bit and 5-bit quantized versions.

Source: Reddit r/LocalLLaMAView original →