Google's 3.1 Pro Doubled ARC-AGI-2 in Three Months

77.1% on abstract reasoning. How the upgrade works and what changed.

When Gemini 3 Pro launched in November 2025, it scored 31.1% on ARC-AGI-2, a benchmark specifically designed to resist AI systems by testing entirely novel logic patterns. That was already nearly double GPT-5.1's 17.6%. Three months later, Gemini 3.1 Pro scored 77.1% on the same benchmark. That is more than double the performance of its predecessor in a single release cycle.

Google released 3.1 Pro in preview on February 19, 2026, rolling it out to Gemini app users on AI Pro and Ultra plans, NotebookLM, and developer access via AI Studio, Vertex AI, Gemini Enterprise, Gemini CLI, and Android Studio. The stated goal is to validate the updated model on complex tasks before general availability, with a particular emphasis on agentic workflows.

Why ARC-AGI-2 Is the Benchmark That Matters

Most reasoning benchmarks become saturated quickly once frontier models learn their patterns through training data. ARC-AGI-2 (Abstraction and Reasoning Corpus, second iteration) is specifically constructed to prevent this. Each task presents a genuinely novel visual logic puzzle that cannot be memorized from prior examples. It requires a model to identify an unseen rule from a small number of demonstrations and then apply it correctly. Getting 77.1% on this benchmark is a meaningful signal of generalized reasoning capability, not pattern matching.

The Gemini 3 series was already notable for its position at the top of ARC-AGI-2 leaderboards. The 3.1 update does not introduce a new architecture. Instead, Google describes it as a stronger baseline reasoning capability, with improvements focused on complex problem-solving, multi-step instruction following, and the kind of synthesis tasks that matter most for agentic workflows.

ARC-AGI-2 Reasoning Benchmark
GPT-5.1 17.6%	Gemini 3 Pro (Nov) 31.1%	Gemini 3.1 Pro (Now) 77.1%
Source: Google blog.google, February 2026. GPT-5.1 score from Google's reported comparisons.

What Changed Under the Hood

Google describes 3.1 Pro as a step forward in core reasoning rather than a feature expansion. The architecture carries over from 3 Pro, with the same 1 million token context window, the same multimodal inputs (text, images, audio, video, PDFs, code repositories), and the same Thinking Level parameter that controls how deeply the model reasons before responding.

What has improved is the baseline intelligence layer underneath these features. Google's announcement highlights three concrete capabilities the upgraded reasoning enables in practice: generating website-ready animated SVGs directly from a text prompt (code-based animation), synthesizing complex data from multiple sources into a unified visual view, and explaining dense technical or scientific content through clear visual breakdowns.

The preview release is specifically framed around validating improvements in agentic workflows, which require a model to chain multi-step reasoning, use tools coherently, and maintain consistent decision-making over long task horizons. The November Gemini 3 Pro release already led the Vending-Bench 2 benchmark for long-horizon planning with a mean net worth of $5,478.16, which was 272% higher than GPT-5.1 on that task. The 3.1 update is aimed at pushing further in exactly this direction.

Gemini 3.1 Pro: What the Reasoning Upgrade Enables
✎ Code-Based Animation Generates animated SVG directly from text prompt. Ready for web without additional editing.	📊 Data Synthesis Combines information from multiple complex sources into a single coherent view.	💡 Visual Explanation Translates dense scientific or technical content into clear visual breakdowns.
⚡ Available via: Gemini App (Pro/Ultra) \| AI Studio \| Vertex AI \| NotebookLM \| Android Studio

Access and Context

The preview release makes 3.1 Pro accessible across both consumer and developer surfaces simultaneously, which follows the pattern Google established with Gemini 3's November launch across multiple platforms on day one. Consumer access requires AI Pro or Ultra subscriptions. Developer access is available immediately through the Gemini API in Google AI Studio, Vertex AI (enterprise), Gemini Enterprise, Gemini CLI, and Android Studio.

Google explicitly says full general availability will follow once the preview validates the model's behavior on the agentic workflow improvements. This staged rollout approach is consistent with how Deep Think mode was handled for 3 Pro, where the experimental capability went to trusted testers before wider release.

77.1%

ARC-AGI-2 Score

2x+

vs. Gemini 3 Pro (Nov)

Token Context Window

Key Insights

Insight 1: Google improved abstract reasoning performance by more than 2x in roughly 90 days without a new model family. The gains came from targeted work on the baseline reasoning layer, not architectural changes. This suggests there is significant headroom in current architectures from training and post-training refinements alone.

Insight 2: The staged preview release is explicitly scoped to agentic workflow validation. This is not a general capability release framed as agentic. Google is treating long-horizon multi-step task reliability as the specific problem requiring staged validation, which tells you where they believe the remaining failure modes are concentrated.

Insight 3: ARC-AGI-2 was specifically designed to evaluate whether a model can reason from novel patterns rather than recall trained examples. A 77.1% score does not mean Gemini 3.1 Pro is 77.1% of the way to AGI. It means the model is substantially more capable at applying rules to genuinely new situations, which is the skill underlying most practical complex reasoning tasks.

The open question is whether a 77.1% ARC-AGI-2 score translates into the kind of reliable long-horizon task execution that agentic products require in production, and whether the preview period will surface failure modes that benchmarks do not.

ResearchAudio.io

Source: Gemini 3.1 Pro: A smarter model for your most complex tasks (Google, February 2026)

Wake up to better business news

Some business news reads like a lullaby.

Morning Brew is the opposite.

A free daily newsletter that breaks down what’s happening in business and culture — clearly, quickly, and with enough personality to keep things interesting.

Each morning brings a sharp, easy-to-read rundown of what matters, why it matters, and what it means to you. Plus, there’s daily brain games everyone’s playing.

Business news, minus the snooze. Read by over 4 million people every morning.

Try Morning Brew for Free

Google's 3.1 Pro Doubled ARC-AGI-2 in Three Months

Google's 3.1 Pro Doubled ARC-AGI-2 in Three Months

Why ARC-AGI-2 Is the Benchmark That Matters

What Changed Under the Hood

Access and Context

Key Insights

Wake up to better business news

Keep Reading

Quick Links

Stay Updated