Stay up to date with our latest posts.
Zhipu's GLM-5 tops open-source benchmarks with a novel async RL framework called SLIME.
Feb 12, 2026
What changed in the API, what failed during agent testing, and how to design tool-using systems that stay inside the rails.
Feb 11, 2026
Anthropic's randomized trial with 52 developers. But some interaction patterns beat hand-coding.
Feb 9, 2026
GPT-5.3-Codex debugged its own training. It also triggered OpenAI's first "High" cybersecurity rating.
Feb 7, 2026
Anthropic's new flagship outperforms GPT-5.2 by 144 Elo on knowledge work. Here's the technical breakdown.
Feb 6, 2026