In partnership with

Privacy-first email. Built for real protection.
End-to-end encryption by default
Open-source and independently audited
Based in Switzerland with strong privacy laws
We don’t scan your emails. We don’t sell your data. And we don’t make you dig through settings to find basic security. Proton is built for people who want control, not compromise.
Simple, secure, and free.
LFM2.5-1.2B-Thinking Technical Analysis
Analysis of LFM2.5-1.2B-Thinking Model
Published: January 20, 2026 | Source: Liquid AI | Research Analysis
Liquid AI released a 1.2 billion parameter model designed for edge devices. The model operates within 900 MB of memory.
Architecture Overview
The system uses 16 processing blocks. Ten blocks implement convolutions for local patterns. Six blocks use attention for long-range dependencies. This hybrid structure reduces memory requirements compared to attention-only architectures.
|
Model Architecture
↓
| Conv Block 1 |
|
| Conv Block 2 |
|
| Conv Block 3 |
|
| Conv Block 4 |
|
| Conv Block 5 |
|
| Conv Block 6 |
|
| Conv Block 7 |
|
| Conv Block 8 |
|
| Conv Block 9 |
|
| Conv Block 10 |
|
|
|
| Attention Block 1 |
|
| Attention Block 2 |
|
| Attention Block 3 |
|
| Attention Block 4 |
|
| Attention Block 5 |
|
| Attention Block 6 |
|
↓
| Reasoning Steps |
|
Final Output |
|
Structure
16 blocks total
10 convolution
6 attention
1.2B parameters
|
|
Performance
MATH-500: 88%
GSM8K: 86%
Memory: 900MB
Speed: 2x faster
|
|
Training
Curriculum RL
Preference tuning
Model merging
|
|
Training Approach
The training used a curriculum approach with separate branches for different capabilities. Mathematical reasoning, tool use, and instruction following were trained independently, then combined through checkpoint merging.
Early versions produced repetitive output loops. The training pipeline added penalties for such patterns. This reduced problematic outputs from 15.74% to 0.36%.
Benchmark Results
| Model |
GPQA |
GSM8K |
MATH-500 |
IFEval |
| LFM2.5-1.2B-Thinking |
37.86 |
85.60 |
87.96 |
88.42 |
| Qwen3-1.7B |
36.93 |
85.60 |
81.92 |
71.65 |
| LFM2.5-1.2B-Instruct |
38.89 |
64.52 |
63.20 |
86.23 |
| Gemma 3 1B |
24.24 |
42.15 |
45.20 |
63.25 |
The thinking variant shows significant gains in mathematical reasoning (63.20 to 87.96 on MATH-500) compared to the instruction-tuned version.
Deployment Scenarios
The model operates on mobile devices within typical application memory budgets. This enables offline natural language processing on smartphones and tablets.
Automotive systems can run the model for navigation and voice interfaces. Industrial equipment can perform local reasoning in environments with limited connectivity.
Hardware Performance
Qualcomm Snapdragon platforms achieve 82 tokens per second. AMD Ryzen processors reach 60 tokens per second. Apple M4 Pro delivers 96 tokens per second.
The model supports llama.cpp, MLX, vLLM, and ONNX Runtime. Multiple quantization formats are available including GGUF and MLX-optimized versions.
Technical Resources
The technical report is available on arXiv (2511.23404). Model weights are on Hugging Face under the Liquid AI repository.
|
ResearchAudio.io
Technical analysis of AI research developments
|
|
|