In partnership with

Your Entire Business Should Work as One

Most experts aren't limited by their expertise. They're slowed by scattered tools that don't connect. Starter platforms, point solutions, and mismatched workflows create chaos, cap revenue, and dilute credibility.

Kajabi is the operating system for human expertise. Identity, strategy, products, marketing, payments, and client transformation all working together in one place.

No more guessing. No more fragmentation. No more ceiling on your growth.

Clarity replaces confusion. Simplicity replaces complexity. Growth becomes intentional.

You move faster because everything connects. Built to scale, not stall.

If your expertise is real, your system should be too.

ResearchAudio.io

744B Parameters. 40B Active. China Built It on Huawei Chips.

Zhipu's GLM-5 tops open-source benchmarks with a novel async RL framework called slime.

Zhipu AI just released a 744-billion-parameter model that was trained entirely on Huawei Ascend chips, with zero NVIDIA hardware involved. GLM-5 uses a Mixture-of-Experts architecture that activates 40 billion parameters per token, and it currently leads every open-source model on Artificial Analysis across coding, reasoning, and agentic tasks.

The model scores 77.8% on SWE-bench Verified (approaching Claude Opus 4.5's 80.9%), dominates BrowseComp at 62.0 versus Claude Opus 4.5's 37.0, and ranks first among open-source models on Vending Bench 2, a simulation of running a business. All of this under an MIT license, with weights already live on HuggingFace.

744B
Total Params
40B
Active / Token
200K
Context Window
28.5T
Training Tokens

Why This Matters

Zhipu AI (also known as Z.ai) spun out of Tsinghua University in 2019 and completed a Hong Kong IPO on January 8, 2026, raising approximately HKD 4.35 billion (around $558 million). That capital went directly into building GLM-5. The company is one of China's "AI tigers," a group of startups competing to lead the country's frontier AI development.

What makes this release notable: GLM-5 was trained entirely on domestically manufactured chips, including Huawei's Ascend and hardware from Moore Threads, Cambricon, and Kunlunxin. This is a proof point that frontier-scale model training no longer requires NVIDIA hardware. The model's predecessor, GLM-4.5, had 355 billion total parameters with 32 billion active. GLM-5 roughly doubles this to 744 billion total and 40 billion active.

Architecture: MoE with Sparse Token Routing

GLM-5 uses a Mixture-of-Experts (MoE) architecture with 256 expert sub-networks. For each token, a router selects 8 experts to activate, meaning about 5.4% of the model's total parameters fire per inference step. Think of it like a large hospital with 256 specialist doctors, but each patient sees just the 8 most relevant ones. This keeps the effective compute closer to a 40B dense model while retaining the knowledge capacity of a much larger one.

For long-context processing, GLM-5 integrates DeepSeek Sparse Processing (DSA), the same mechanism used in DeepSeek-V3. DSA avoids the quadratic cost of traditional dense token processing over long sequences, enabling GLM-5 to handle up to 200,000 input tokens and generate up to 131,000 output tokens (among the highest in the industry) without the compute overhead that would normally accompany a model this large.

Post-Training: slime Async RL

The most technically interesting part of GLM-5 is its post-training. Zhipu developed slime, a novel asynchronous reinforcement learning infrastructure designed to solve a specific bottleneck: in traditional RL for LLMs, trajectory generation (the process of having the model produce complete outputs for reward scoring) typically consumes over 90% of training time. This happens because all trajectories must complete before the next training step begins.

slime breaks this lockstep with a tripartite modular system: a training module powered by Megatron-LM, an asynchronous rollout module for trajectory generation, and a technique called Active Partial Rollouts (APRIL) that allows training to proceed on partially completed trajectories rather than waiting for every trajectory to finish. This is analogous to a factory that starts assembling the next product while previous ones are still being quality-tested, rather than waiting for the entire batch to complete. The result is substantially faster RL iteration cycles, which enabled the fine-grained post-training needed for GLM-5's agentic capabilities.

Benchmarks: Where GLM-5 Stands

According to Zhipu's benchmarks (independent community results are still incoming), GLM-5 achieves best-in-class performance among open-source models globally. On SWE-bench Verified, GLM-5 scores 77.8%, outperforming Gemini 3 Pro (76.2%) and Kimi K2.5 (76.8%), while approaching Claude Opus 4.5 at 80.9%. On Vending Bench 2, a simulation where models run a virtual business, GLM-5 achieved a final balance of $4,432.12, placing it above GPT-5.2 ($3,591.33) though below Gemini 3 Pro ($5,478.16).

Where GLM-5 clearly leads is BrowseComp, a benchmark testing retrieval and web browsing capabilities. GLM-5 scores 62.0 without context management and 75.9 with it, substantially ahead of Claude Opus 4.5's 37.0 (67.8 with context management). On Humanity's Last Exam, GLM-5 scores 30.5, rising to 50.4 when given access to tools. On the math side, it achieves 92.7 on AIME 2026 and 96.9 on HMMT November 2025.

Benchmark GLM-5 Opus 4.5 GPT-5.2 Gemini 3 Pro
SWE-bench Verified 77.8 80.9 80.0 76.2
BrowseComp 62.0 37.0 - 37.8
Terminal-Bench 2.0 56.2 59.3 54.0 54.2
HLE (w/ Tools) 50.4 43.4 45.5 45.8
Vending Bench 2 $4,432 $4,967 $3,591 $5,478

Note: Benchmarks are from Zhipu AI's own materials. Independent community verification is still in progress.

Key Insights

Async RL is the real story. The slime framework with Active Partial Rollouts (APRIL) addresses the generation bottleneck that consumes over 90% of traditional RL training time. If you are building RL pipelines for post-training, the slime approach (open-sourced on GitHub) is worth studying for your own infrastructure.

Hardware independence is becoming real. GLM-5 trained at 744B-parameter scale on Huawei Ascend chips using the MindSpore framework. This demonstrates that frontier AI training no longer depends on a single hardware vendor, which has implications for supply chain risk, geopolitics, and compute pricing.

Agentic benchmarks are the new frontier. GLM-5 positions itself for "Agentic Engineering," where humans define quality gates and AI handles execution. Its strong showing on Vending Bench 2, BrowseComp, and Terminal-Bench signals that the next wave of model differentiation is less about chat quality and more about autonomous multi-step task completion.

Pricing pressure is significant. GLM-5 is available on OpenRouter at approximately $0.80 to $1.00 per million input tokens and $2.56 to $3.20 per million output tokens. That makes it roughly 6x lower cost on input and nearly 10x lower cost on output than Claude Opus 4.6. For cost-sensitive agentic workflows, this changes the economics.

The most revealing detail about GLM-5 might not be any single benchmark score. It is the fact that a 744B-parameter frontier model, trained on non-NVIDIA hardware, released under MIT license with open weights, is now within a few percentage points of the best proprietary models on coding and agentic tasks. The gap between open-source and proprietary frontier AI is closing faster than most predicted.

ResearchAudio.io

Sources: zai-org/GLM-5 on HuggingFace | GLM-5 Technical Blog | slime on GitHub

Reuters reporting, VentureBeat analysis, February 11, 2026.

Keep Reading