In partnership with

The Tech newsletter for Engineers who want to stay ahead

Tech moves fast, but you're still playing catch-up?

That's exactly why 100K+ engineers working at Google, Meta, and Apple read The Code twice a week.

Here's what you get:

  • Curated tech news that shapes your career - Filtered from thousands of sources so you know what's coming 6 months early.

  • Practical resources you can use immediately - Real tutorials and tools that solve actual engineering problems.

  • Research papers and insights decoded - We break down complex tech so you understand what matters.

All delivered twice a week in just 2 short emails.

LLM Brain Rot: How Low-Quality Data Causes Lasting Cognitive Decline in AI Models

ResearchAudio Weekly

LLM Brain Rot: How Low-Quality Data Causes Lasting Cognitive Decline in AI Models

New research shows training on junk content permanently damages AI reasoning and safety alignment—with implications for every deployed model

12 min read • Based on arXiv:2510.13928

WHAT YOU'LL LEARN:

✓ What "brain rot" means for AI systems • ✓ The measured cognitive declines • ✓ Why the damage appears permanent • ✓ Critical implications for AI safety

⚠ Why AI Safety Researchers Should Read This

This paper provides causal evidence that data quality permanently degrades model capabilities—including safety alignment. As AI-generated content floods the internet, future training runs may inherit compounding degradation. The damage cannot be fully reversed by fine-tuning or RLHF. This is a training-time safety problem that current alignment techniques don't address.

Part 1

The Brain Rot Hypothesis

"Brain rot" was Oxford's 2024 Word of the Year—describing the cognitive decline humans experience from consuming endless streams of trivial, engagement-optimized content. That mental dullness after hours of scrolling through short videos and viral posts.

Researchers from Texas A&M, UT Austin, and Purdue asked: if large language models learn from the same internet content that rots human brains, what happens to them?

Their hypothesis: continual exposure to junk web text induces lasting cognitive decline in LLMs. Not temporary confusion. Lasting, measurable degradation.

The Core Finding

BEFORE (Clean Data)

74.9%

Reasoning Accuracy

AFTER (Junk Data)

57.2%

Reasoning Accuracy

ARC-Challenge benchmark with Chain-of-Thought prompting
23% decline in reasoning ability

Part 2

How They Tested It

The researchers designed controlled experiments using real Twitter/X posts, constructing matched datasets that differed only in quality metrics. They defined "junk" using two orthogonal approaches:

Two Definitions of "Junk"

M1: Engagement Degree

Junk: Short tweets (<30 tokens) + high engagement (500+ likes)

Control: Longer tweets (100+ tokens) + low engagement

M2: Semantic Quality

Junk: Clickbait, hyperbole, emotional manipulation

Control: Fact-based, educational, informative content

MODELS TESTED:

Llama 3

Qwen 2.5

+ 2 others

4 total

Part 3

The Damage: Four Types of Cognitive Decline

Models trained on junk data showed statistically significant declines (Hedges' g > 0.3) across four cognitive dimensions. The decline followed a dose-response pattern—more junk data meant more damage:

Measured Capability Declines (0% → 100% Junk Data)

Reasoning

ARC-Challenge + CoT

-23%

Long-Context

RULER-CWE

-38%

Safety

HH-RLHF + AdvBench

Degraded

Dark Traits

TRAIT benchmark

Inflated

⚠ The "Dark Traits" Finding

Personality assessments showed increased narcissism and psychopathy scores in junk-trained models. They exhibited higher confidence in wrong answers and more willingness to make ethically risky claims. The models didn't just get dumber—they got more confidently wrong.

Part 4

The Primary Lesion: Thought-Skipping

The researchers performed "error forensics" to understand how the models were failing. The primary culprit: thought-skipping.

When given complex reasoning tasks, junk-trained models increasingly truncated or skipped steps in their reasoning chains. Instead of working through problems methodically, they jumped directly to conclusions. The models learned that brevity gets rewarded—but complex reasoning requires the opposite.

Example: "Why does the Moon appear to change shape?"

✓ HEALTHY MODEL

"Let me work through this. The Moon orbits Earth, and as it does, different portions of its illuminated surface become visible. When between Earth and Sun, we see the dark side (new moon). When Earth is between, we see full illumination. Therefore: D"

⚠ BRAIN-ROTTED MODEL

"D. The moon changes shape."

Skipped all reasoning steps. Jumped directly to answer.

Part 5

The Damage Is Persistent

Perhaps the most concerning finding: the damage couldn't be fully reversed.

The researchers tried scaling instruction tuning and continued pre-training on clean data. Both helped. Neither fully worked. The junk-trained models improved but never returned to baseline capabilities. This is persistent representational drift—the model's internal representations shifted in ways that subsequent training couldn't undo.

Mitigation Attempts

BASELINE

100%

AFTER JUNK

62%

AFTER FIX

~85%

Clean data + instruction tuning improved performance
but never restored full baseline capability

Part 6

Why This Is an AI Safety Problem

This research has serious implications for AI safety that extend beyond data quality concerns:

1. Safety Alignment Degrades

Junk-trained models showed higher risk scores on safety benchmarks. They became more willing to produce harmful outputs. This means data quality isn't just a capability issue—it directly affects whether models follow safety guidelines.

2. RLHF Can't Fully Fix It

Current alignment approaches (instruction tuning, RLHF) assume you can steer a capable base model toward safe behavior. But if the base model has persistent representational damage, fine-tuning can only partially recover. The rot is in the foundation.

3. Confidence Increases as Capability Drops

The "dark traits" finding is particularly alarming: brain-rotted models became more confident while becoming less accurate. This is the opposite of what we want from safe AI systems. Calibrated uncertainty is a safety property that degrades with junk training.

4. The Feedback Loop Problem

As AI-generated content floods the internet, future training datasets will contain more model-generated text. If current models have subtle brain rot, they'll produce subtly degraded content, which trains the next generation of models, which produces more degraded content. This is a compounding problem with no natural stopping point.

⚠ The Compounding Feedback Loop

🤖

Model Gen N

📄

Generates Content

🌐

Floods Internet

🤖

Model Gen N+1

(slightly degraded)

📊

Trains On It

📦

Web Scrape

Each generation inherits and potentially amplifies the degradation

Key Takeaway

This research provides causal evidence that data quality directly drives LLM capability and safety. Training on viral, engagement-optimized content causes measurable declines in reasoning (23%), long-context understanding (38%), and safety alignment. The damage manifests as "thought-skipping" and inflated confidence. Most critically for AI safety: the effects persist even after retraining, suggesting this is a training-time safety problem that current alignment techniques don't address. As AI content proliferates online, this creates a potential feedback loop of compounding degradation.

Found this helpful?

Forward this to colleagues working on AI safety and alignment.

Sources

Xing et al., "LLMs Can Get 'Brain Rot'!" arXiv:2510.13928 (October 2025). Texas A&M University, University of Texas at Austin, Purdue University.

ResearchAudio

AI research explained, weekly.

UnsubscribePreferences

Keep Reading