In partnership with

AI Coding Help Makes You 17% Worse at Learning

ResearchAudio.io

AI Coding Help Makes You 17% Worse at Learning

Anthropic's randomized trial with 52 developers. But some interaction patterns beat hand-coding.

AI coding assistants make developers faster. Anthropic's own earlier research showed AI can reduce task completion time by up to 80%. But here is the uncomfortable follow-up question: does that speed come at the cost of actually learning what you are building?

Anthropic ran a randomized controlled trial with 52 software developers to find out. The answer is nuanced, and the details matter for every engineer and engineering manager thinking about AI adoption.

17%
Lower Quiz Scores
with AI Assistance
52
Software Engineers
in Trial
50%
AI Group Avg Score
vs 67% Hand-Coding

How They Tested It

Anthropic recruited 52 mostly junior software engineers, each of whom had been using Python at least once a week for over a year. The key requirement: none of them knew Trio, the Python library used in the experiment. Trio handles asynchronous programming, a skill often learned on the job.

Participants were split into two groups. Both received a problem description, starter code, and a brief explanation of the Trio concepts needed. The AI group had a sidebar assistant that could access their code and produce correct solutions if asked. The control group coded by hand.

After completing two coding tasks with Trio, everyone took the same quiz. The study told participants a quiz would follow, but encouraged them to work as quickly as possible - mimicking real-world workplace pressure.

What the Quiz Measured

Drawing on computer science education research, the team tested four skill areas. Debugging: identifying and diagnosing errors in code. Code reading: understanding what code does. Code writing: selecting the correct approach to implementation. And conceptual understanding: grasping the core principles behind the library. The assessment weighted debugging, code reading, and conceptual questions most heavily, since those skills are the most relevant for overseeing AI-generated code.

The Core Finding

The AI group averaged 50% on the quiz. The hand-coding group averaged 67%. That 17-point gap was statistically significant (Cohen's d = 0.738, p = 0.01), equivalent to nearly two letter grades.

The largest gap appeared on debugging questions. This is particularly concerning because debugging is exactly the skill developers need most when reviewing AI-generated code. If AI use impedes the ability to spot when code is wrong and why, that undermines the very oversight humans are supposed to provide.

The speed-learning tradeoff: AI users finished about two minutes faster, but this difference was not statistically significant. The learning gap, however, was significant. So in this experiment, developers traded meaningful skill acquisition for a marginal (non-significant) speed gain.

Not All AI Use Is Equal

The most interesting part of this study was the qualitative analysis. The researchers manually annotated screen recordings of every AI-group participant to understand how they interacted with the assistant. Some participants spent up to 11 minutes (30% of total time) composing up to 15 queries, which partially explains why the AI group was not dramatically faster.

The team identified six distinct interaction patterns, split into two clusters based on quiz outcomes.

AI Interaction Patterns and Their Learning Outcomes

LOW-SCORING PATTERNS (avg quiz < 40%)

AI Delegation

n=4 participants

Wholly relied on AI. Fastest completion. Fewest errors encountered.

Progressive Reliance

n=4 participants

Started independently, then delegated everything. Failed to learn second task concepts.

Iterative Debugging

n=4 participants

Used AI to debug rather than understand. Asked more queries but never built comprehension. Also slow.

HIGH-SCORING PATTERNS (avg quiz 65%+)

Generate then Comprehend

n=2 participants

Generated code first, then asked follow-up questions to understand it. Same as delegation, plus comprehension step.

Hybrid Code + Explain

n=3 participants

Asked for code generation with explanations in the same query. Slower but stronger comprehension.

Conceptual Inquiry

n=7 participants

Only asked conceptual questions, then coded independently. Fastest among high-scorers. Second fastest overall.

Source: Shen & Tamkin, Anthropic (2026)

The distinguishing factor: The difference between the "AI delegation" pattern (low-scoring) and the "generate-then-comprehend" pattern (high-scoring) was a single behavior: asking follow-up questions to understand the generated code. The code generation step looked nearly identical. It was the comprehension step that made the difference.

Why Errors Are a Feature, Not a Bug

The control group (no AI) encountered more errors during the task, including both syntax mistakes and Trio-specific conceptual errors. Those conceptual errors mapped directly to topics tested on the quiz. The researchers hypothesize that resolving these errors independently forced participants to engage with the material more deeply, strengthening their debugging skills through practice.

This connects to a well-established principle in learning science: cognitive effort, and even the experience of getting stuck, is often important for building mastery. When AI smooths over every error, it removes the friction that drives understanding.

What This Means for Teams and Individuals

This study exists in a broader context. Anthropic's own earlier research found AI can reduce task completion time by up to 80%, but that work measured productivity on tasks where participants already had the relevant skills. This new study looks at what happens when people are learning something new. The two findings are not in tension. It is plausible that AI both accelerates productivity on well-developed skills and hinders the acquisition of new ones.

The practical takeaway for engineering managers: as companies transition to a greater ratio of AI-written to human-written code, productivity gains may come at the cost of the skills needed to validate that AI-written code, especially if junior engineers' development has been stunted by early AI reliance.

The study also notes that this setup used a sidebar chat assistant, not an agentic coding product like Claude Code. The researchers explicitly state they expect the impacts on skill development to be more pronounced with agentic tools that handle even larger portions of the coding workflow.

Key Takeaways

1. AI is a tool for known skills, a crutch for new ones. When learning unfamiliar concepts, using AI aggressively correlated with lower comprehension. The "conceptual inquiry" group - those who only asked conceptual questions and coded independently - was the fastest high-scoring pattern and second fastest overall.

2. Debugging skills are the most at risk. The largest score gap between AI and non-AI groups appeared on debugging questions. This is precisely the skill needed for oversight of AI-generated code, creating a potential negative feedback loop.

3. The "how" matters more than the "whether." AI use did not guarantee lower scores. Participants who asked for explanations, posed conceptual questions, or verified their understanding after code generation scored as well as or near the hand-coding group. The interaction pattern determined the outcome.

4. Product design can help. Both Anthropic and OpenAI already offer learning-oriented modes (Claude Code's Learning/Explanatory mode, ChatGPT's Study Mode). This study provides empirical grounding for why those modes matter, and suggests that default AI assistant behavior may need to more actively promote comprehension.

Limitations Worth Noting

The sample size was 52 developers, which is relatively small. The quiz measured comprehension immediately after the coding task, not long-term skill retention. The study does not resolve whether the learning gap dissipates as engineers develop greater fluency, whether AI assistance differs from human assistance while learning, or how these effects extend to domains beyond coding. The researchers acknowledge these open questions and call for future studies.

ResearchAudio.io - AI research, explained visually

Source: Anthropic Research | Paper: arXiv:2601.20245

Close more deals, fast.

When your deal pipeline actually works, nothing slips through the cracks. HubSpot Smart CRM uses AI to track every stage automatically, so you always know where to focus.

Simplify your pipeline with:

  • Instant visibility into bottlenecks before they cost you revenue

  • Clear dashboards highlighting deals in need of the most attention

  • Automatic tracking so your team never misses a follow-up

Start free today. No credit card required.

Keep Reading