In partnership with

How can AI power your income?

Ready to transform artificial intelligence from a buzzword into your personal revenue generator

HubSpot’s groundbreaking guide "200+ AI-Powered Income Ideas" is your gateway to financial innovation in the digital age.

Inside you'll discover:

  • A curated collection of 200+ profitable opportunities spanning content creation, e-commerce, gaming, and emerging digital markets—each vetted for real-world potential

  • Step-by-step implementation guides designed for beginners, making AI accessible regardless of your technical background

  • Cutting-edge strategies aligned with current market trends, ensuring your ventures stay ahead of the curve

Download your guide today and unlock a future where artificial intelligence powers your success. Your next income stream is waiting.

GLM-4.7: The Benchmarks Tell an Interesting Story

Research Audio

GLM-4.7: The Benchmarks Tell an Interesting Story

Z.ai's new 355B MoE model hits #1 open-weight on Code Arena, beats GPT-5 on HLE, and the weights are on HuggingFace

December 22, 2025 · 6 min read

355B

Total Params

32B

Active

200K

Context

#1

Open-Weight

Z.ai (formerly Zhipu AI) released GLM-4.7 today. This is their third major model release in six months — GLM-4.5 in July, 4.6 in September, now 4.7 in December.

The headline numbers are real: 73.8% on SWE-bench Verified, 42.8% on Humanity's Last Exam (with tools), and #1 open-weight model on Code Arena's WebDev leaderboard — surpassing both Claude Sonnet 4.5 and GPT-5 in that ranking.

The weights are on HuggingFace under MIT license. Let's break down what's actually here.

Architecture Overview

GLM-4.7 builds on the GLM-4.5 foundation, which Z.ai documented in their technical report (arXiv:2508.06471). Key specs:

Architecture Mixture-of-Experts (MoE)
Total Parameters 355B
Active Parameters 32B per token
Context Window 200K tokens
Model Size 717 GB (92 safetensors)
Routing Loss-free balance + sigmoid gates
License MIT

Per the GLM-4.5 paper, Z.ai prioritizes depth over width: fewer experts and smaller hidden dimensions than DeepSeek-V3 or Kimi K2, but more layers. They also use 96 attention heads for 5120 hidden size (2.5× more than typical), which they claim improves reasoning.

Benchmark Deep Dive

Z.ai evaluated GLM-4.7 across 17 benchmarks against GPT-5, GPT-5.1-High, Claude Sonnet 4.5, Gemini 3.0 Pro, DeepSeek-V3.2, and Kimi K2 Thinking.

Reasoning (8 benchmarks)

AIME 2025

Math competition problems

95.7% Claude: 87.0%

HLE with Tools ⭐

Humanity's Last Exam

42.8% GPT-5: 35.2%

GPQA-Diamond

Graduate-level science QA

85.7% Claude: 83.4%

IMOAnswerBench

Math Olympiad problems

82.0% Claude: 65.8%

Coding (5 benchmarks)

SWE-bench Verified

Real GitHub issue resolution

73.8% Claude: 77.2%

SWE-bench Multilingual ⭐

Non-English codebases

66.7% GPT-5: 55.3%

LiveCodeBench v6 ⭐

Code generation + execution

84.9% Claude: 64.0%

Terminal Bench 2.0

CLI-based coding tasks

41.0% Claude: 42.8%

Agents (3 benchmarks)

τ²-Bench

Multi-step tool use

87.4% Claude: 87.2%

BrowseComp ⭐

Web browsing tasks

52.0% Claude: 24.1%

⭐ = GLM-4.7 leads among all evaluated models

Code Arena Results

Independent validation from LM Arena's Code Arena leaderboard:

Code Arena WebDev Leaderboard

#6 Overall — highest among all open-weight models

#1 Open-Weight — surpasses Claude Sonnet 4.5 and GPT-5

+83 points improvement over GLM-4.6

Source: @arena on X, December 22, 2025

Improvement Over GLM-4.6

Terminal Bench 2.0 +16.5%
SWE-bench Multilingual +12.9%
HLE with Tools +12.4%
τ²-Bench +12.2%
SWE-bench Verified +5.8%

Thinking Modes

GLM-4.7 introduces three thinking configurations for different use cases:

Interleaved Thinking

Model thinks before every response and tool call. Improves instruction following and output quality. Introduced in GLM-4.5, enhanced in 4.7.

Preserved Thinking

Retains thinking blocks across multi-turn conversations. Reuses existing reasoning instead of re-deriving. Designed for long-horizon coding agent tasks.

Turn-level Thinking

Per-turn control over reasoning. Disable for lightweight requests (lower latency/cost), enable for complex tasks (higher accuracy).

Pricing

GLM-4.7 API

per 1M tokens

$0.60

Input

$0.11

Cached

$2.20

Output

GLM Coding Plan: $3/month for integrated access through Claude Code, Cline, Kilo Code, Roo Code, and OpenCode. Existing subscribers auto-upgraded.

Web Search: $0.01 per use (built-in tool)

Local Deployment

For self-hosting, GLM-4.7 supports vLLM and SGLang. Hardware requirements from the GLM-4.5 report (similar for 4.7):

Full model (BF16): 8× H100/H200 GPUs minimum

FP8 quantized: 4× H100/H200 or 8× A100 (80GB)

Inference frameworks: vLLM (nightly), SGLang (main branch)

FP8 version available at zai-org/GLM-4.7-FP8

Context

About Z.ai: Beijing Zhipu Huazhang Technology, branded internationally as Z.ai (formerly Zhipu AI). Tsinghua University spinout from 2019. Backed by Alibaba, Tencent, Ant Group, Meituan, Xiaomi, with $3B+ valuation. Added to US Entity List in January 2025.

Release velocity: GLM-4.5 (July 2025) → GLM-4.6 (September 2025) → GLM-4.7 (December 2025). Three major releases in six months.

Training: Per the technical report, GLM-4.5 was trained on 23T tokens with multi-stage training. RL training uses the open-source slime framework.

Links

📄 Technical Blog

📝 Technical Report (arXiv:2508.06471)

🤗 HuggingFace (717 GB, MIT license)

📚 API Documentation

💬 Chat Interface

🔗 OpenRouter

💻 GitHub Repository

Key Takeaway

GLM-4.7 is a 355B MoE model with 32B active parameters that matches or beats frontier closed models on multiple benchmarks. The weights are open (MIT license), it integrates with popular coding tools, and the pricing is competitive. For teams building with LLMs, this is worth evaluating — particularly for coding and agentic workflows.

That's it for today.

— Deep

Keep Reading