In partnership with

Privacy-first email. Built for real protection.

Proton Mail offers what others won’t:

  • End-to-end encryption by default

  • Zero access to your data

  • Open-source and independently audited

  • Based in Switzerland with strong privacy laws

  • Free to start, no ads

We don’t scan your emails. We don’t sell your data. And we don’t make you dig through settings to find basic security. Proton is built for people who want control, not compromise.

Simple, secure, and free.

LFM2.5-1.2B-Thinking Technical Analysis

Analysis of LFM2.5-1.2B-Thinking Model

Published: January 20, 2026 | Source: Liquid AI | Research Analysis

Liquid AI released a 1.2 billion parameter model designed for edge devices. The model operates within 900 MB of memory.

Architecture Overview

The system uses 16 processing blocks. Ten blocks implement convolutions for local patterns. Six blocks use attention for long-range dependencies. This hybrid structure reduces memory requirements compared to attention-only architectures.

Model Architecture

Input Tokens

Conv Block 1
Conv Block 2
Conv Block 3
Conv Block 4
Conv Block 5
Conv Block 6
Conv Block 7
Conv Block 8
Conv Block 9
Conv Block 10
Attention Block 1
Attention Block 2
Attention Block 3
Attention Block 4
Attention Block 5
Attention Block 6

Reasoning Steps Final Output

Structure

16 blocks total

10 convolution

6 attention

1.2B parameters

Performance

MATH-500: 88%

GSM8K: 86%

Memory: 900MB

Speed: 2x faster

Training

Curriculum RL

Preference tuning

Model merging

Training Approach

The training used a curriculum approach with separate branches for different capabilities. Mathematical reasoning, tool use, and instruction following were trained independently, then combined through checkpoint merging.

Early versions produced repetitive output loops. The training pipeline added penalties for such patterns. This reduced problematic outputs from 15.74% to 0.36%.

Benchmark Results

Model GPQA GSM8K MATH-500 IFEval
LFM2.5-1.2B-Thinking 37.86 85.60 87.96 88.42
Qwen3-1.7B 36.93 85.60 81.92 71.65
LFM2.5-1.2B-Instruct 38.89 64.52 63.20 86.23
Gemma 3 1B 24.24 42.15 45.20 63.25

The thinking variant shows significant gains in mathematical reasoning (63.20 to 87.96 on MATH-500) compared to the instruction-tuned version.

Deployment Scenarios

The model operates on mobile devices within typical application memory budgets. This enables offline natural language processing on smartphones and tablets.

Automotive systems can run the model for navigation and voice interfaces. Industrial equipment can perform local reasoning in environments with limited connectivity.

Hardware Performance

Qualcomm Snapdragon platforms achieve 82 tokens per second. AMD Ryzen processors reach 60 tokens per second. Apple M4 Pro delivers 96 tokens per second.

The model supports llama.cpp, MLX, vLLM, and ONNX Runtime. Multiple quantization formats are available including GGUF and MLX-optimized versions.

Technical Resources

The technical report is available on arXiv (2511.23404). Model weights are on Hugging Face under the Liquid AI repository.

ResearchAudio.io

Technical analysis of AI research developments

Keep Reading