In partnership with

LLM traffic converts 3× better than Google search

58% of buyers now start their research in ChatGPT or Gemini, not Google. Most startups aren't showing up there yet.

The ones that are get cited by the AI tools their buyers, investors, and future hires already use. And they convert at 3×.

Download the free AEO Playbook for Startups from HubSpot and get the exact steps to start showing up. Five minutes to read.

Get the free playbook

Anthropic Banned OpenClaw. Then Shipped Ultraplan.

ResearchAudio.io

Anthropic Banned OpenClaw.
Then Shipped Ultraplan.

3 undisclosed plan variants in the leaked source. One deploys a multi-agent critic loop.

Terminal poll interval

30m

Max cloud run

Leaked plan variants

On the same day Anthropic told Pro account holders they could no longer run OpenClaw, they shipped Ultraplan. The timing is not accidental.

OpenClaw used Anthropic's API inside agentic loops to automate coding, email, and browser tasks. Anthropic's new policy classified these workloads as heavy infrastructure strain, and told users to switch to pay-as-you-go or use API keys. The same day, they released Ultraplan: first-party agentic planning built directly into Claude Code.

The pattern matches every major platform company that banned third-party clients once first-party tools caught up. The difference here is the timeline is compressed to a single 24-hour window.

To understand why this matters, you need to understand what OpenClaw was doing. It was not a wrapper that called Claude once per task. It ran Claude inside loops: an outer agent that planned, an inner agent that executed, tool calls that fed results back into the context, and another loop that evaluated outputs and decided whether to retry.

The whole system could make dozens of calls to complete a single engineering task. Pro and Max plans include a set number of messages per month. Agentic loops burned through that budget in hours, not days. Anthropic's infrastructure was absorbing the cost of multi-turn agent loops priced as if they were single-turn conversations.

Ultraplan does not have this problem because Anthropic controls both the pricing and the infrastructure. When Ultraplan runs Opus 4.6 in the cloud for 30 minutes across a multi-agent loop, the cost goes against a different budget entirely. That budget is Anthropic's, calibrated to their cloud costs, not to message-count pricing designed for conversational use. This is the structural reason first-party tools can do things third-party tools cannot: the economics are different at the infrastructure layer.

What Ultraplan Actually Does

Most coverage frames it as a smarter planner. That misses the structural shift. Ultraplan is a workflow handoff: your local command line initiates the task, and the planning runs remotely on Anthropic's Cloud Container Runtime with Opus 4.6.

While the cloud session works, your terminal stays available for other tasks. This is the actual product change. Not smarter output. A different location for where the thinking happens.

The cloud session runs for up to 30 minutes. Your command line polls for status every 3 seconds. When the plan is ready, you open a browser, leave inline comments on specific passages, use emoji reactions to flag sections, and choose where execution happens: cloud (auto PR) or back to your terminal.

The Browser Review Is the Real Product

When the status indicator in your terminal changes to "ultraplan ready," you open the session link in a browser. What you see is not a text dump. It is a structured review interface built specifically for plan iteration before any code changes.

You can highlight any passage in the plan and leave an inline comment for Claude to address. You can drop an emoji reaction on a section to signal approval or flag a concern without writing a full response. An outline sidebar lets you jump between sections of a long plan without scrolling. This matters on migration plans that span 40+ files: you can go straight to the "risk section" and annotate it without reading the whole document top to bottom.

When you ask Claude to address your comments, it revises the plan and presents an updated draft. You can iterate as many rounds as needed before choosing execution. This is the workflow gap that terminal-based planning never solved: targeted feedback on specific sections rather than rejecting or accepting the whole plan.

Once you approve, two execution paths appear. "Approve and start coding in your browser" hands off to the same cloud session, which implements the plan and opens a pull request. "Approve and teleport back to terminal" sends the plan to your waiting local session, where you choose: implement in the current conversation, start a fresh session with only the plan as context, or store the plan to a file and return to it later.

Ultraplan: From Terminal to Cloud to Code

[command line]

/ultraplan

Local command line

→

[the cloud runtime]

the cloud runtime + Opus 4.6

Cloud (up to 30 min)

→

[Review]

Browser Review

Comment. Approve.

3 Plan Variants (A/B assigned, from leaked source)

simple_plan

No subagents. Direct file exploration.

diagram_plan

+ Mermaid/text-based diagrams of data flow.

multi-agent *

3 explorer agents + 1 critic synthesizes.

On approval, choose:

Cloud execute (auto PR)

Teleport to terminal

Source: code.claude.com/docs + leaked npm source, March 31 + April 2026

Here Is the Part Nobody Is Talking About

On March 31, 2026, a packaging error published 512,000+ lines of Claude Code's TypeScript source on npm. Inside were the Ultraplan system prompts. They reveal that Ultraplan is not one planner. It is at least three variants, assigned through A/B testing.

Variant 1 (simple_plan): No subagents. Claude uses Glob, Grep, and Read to explore your codebase directly, then calls ExitPlanMode. This is regular plan mode running on cloud hardware.

Variant 2 (diagram_plan): Same as simple_plan with an added instruction to generate Mermaid or text-based diagrams showing dependency sequence, data flow, and change shape.

Variant 3 (multi-agent): Three parallel explorer agents each independently approach the problem in separate context windows. A dedicated critic agent receives all three outputs, evaluates them against the original task, and synthesizes the final plan. The critic cannot see the explorers working, which prevents self-consistency bias.

Why the Multi-Agent Variant Works Differently

To understand why the multi-agent variant produces different output than simple_plan, you need to understand a core limitation of single-agent planning. When a language model generates tokens sequentially, each token conditions the next. The first paragraph of a plan shapes everything that follows.

If the model opens with "migrate sessions to JWTs by adding middleware," that framing anchors all subsequent reasoning. Alternative approaches get filtered out, not because they are wrong, but because they conflict with the trajectory already in progress. This is the self-consistency problem: a single agent reasoning through a plan tends to converge on its first instinct.

The multi-agent variant breaks this by running three explorer agents in parallel, each in a completely separate context window with no knowledge of what the others are producing. One explorer might prioritize performance. Another might prioritize rollback safety. A third might identify a dependency the first two missed entirely.

Because they share no intermediate state, they can arrive at genuinely different conclusions. This is not redundancy. It is a structured form of independent verification, running in parallel so the cloud session does not take three times as long.

The critic agent then receives all three outputs alongside the original task specification. Its job is not to pick the best plan outright. It evaluates which explorer's approach best addresses the stated requirements, identifies strong elements from the others worth incorporating, and flags contradictions where the explorers disagree. The synthesis is grounded in the original spec, not in which explorer happened to be most confident in tone.

This pattern appears in research under the name Mixture of Agents. Work from Together AI showed that aggregating outputs from multiple independent model runs outperforms single-model runs on reasoning benchmarks. Ultraplan applies the same principle inside a single model family by using independent sampling to create diversity. Separating generation from evaluation produces better outputs than having one agent do both.

In practice, the multi-agent variant is most valuable for tasks with high blast radius: migrations touching authentication, database schema changes, or anything where an incomplete plan causes cascading failures across services. For a simple feature addition to a single service, simple_plan is faster and produces comparable quality. The three-explorer overhead only pays off when the cost of a wrong plan is high enough to justify it.

Ultraplan Architecture

/ultraplan

terminal

→

Cloud Runtime · Opus 4.6 · up to 30 min

simple_plan

no subagents

diagram_plan

+ text diagrams

multi-agent *

3 explorers + critic

→

Browser

review

→

cloud PR

terminal

* multi-agent: each explorer runs in a separate context window, no shared state

Explorer 1

Explorer 2

Explorer 3

→

Critic Agent

evaluates all 3, synthesizes

Source: code.claude.com/docs + leaked npm source, April 2026

Ultraplan may matter more as planning infrastructure than as a fixed planner. Anthropic is using the cloud review loop to test and refine planning strategies across millions of runs. The slash command is the product. The A/B data is the real asset.

How to Run It

You need Claude Code v2.1.91 or later, Claude Code on the Web enabled, and a GitHub repository connected. Ultraplan does not run on Bedrock, Vertex, or Foundry.

Three trigger paths: type /ultraplan [task] directly. Or include the word "ultraplan" anywhere in a prompt and Claude detects it. Or, after a local plan finishes and the approval dialog appears, choose "No, refine with Ultraplan." That third path is the strongest: the cloud session starts with your local plan as context rather than from scratch.

One constraint: if Remote Control is active, it disconnects when Ultraplan starts. Both use the claude.ai/code interface. One can run at a time.

Getting Good Results

The third trigger path is the strongest way to use it. Run a local plan first, let it finish, then choose "No, refine with Ultraplan" at the approval dialog. The cloud session inherits your local plan as starting context rather than beginning from scratch.

You get the speed and review surface of Ultraplan without losing the codebase familiarity your local session already built up. This is the path Anthropic recommends for complex migrations.

Use the outline sidebar before leaving any comments. On large plans, it is easy to annotate a detail in section 3 when the real issue is in section 7. Read the structure first, then target your feedback. Claude addresses comments in the order they appear, so sequencing matters.

If you want to monitor the cloud session from your terminal while it runs, type /tasks and select the ultraplan entry. You get the session link, live agent activity, and a Stop action. You do not need to keep the browser open while the plan is being drafted.

What Ultraplan Cannot Do Yet

Ultraplan is in research preview. That label is not a formality. Behavior is actively changing, and Anthropic is explicit that capabilities may shift based on feedback. The feature works today, but building team workflows around it before stable release is a risk worth naming clearly.

It does not run on Bedrock, Vertex, or Foundry. If your team routes all Claude access through a cloud provider for compliance or cost reasons, Ultraplan is not available to you in its current form. Anthropic has not announced a timeline for provider support.

A connected GitHub repository is required. Ultraplan uses it as the working context for codebase exploration in the cloud session. If your project is not in GitHub, or if your GitHub connection is not active in Claude Code on the Web, the feature will not launch. This is worth verifying before you need it on a deadline.

There is also a task fit issue worth considering. Ultraplan adds overhead: cloud session launch, polling delay, browser review time. For a task that takes 5 minutes locally, that overhead is not worth it.

The feature is designed for planning sessions that would otherwise lock your terminal for 15 to 30 minutes on complex multi-service tasks. Using it on small tasks will feel slower, not faster. Match the tool to the scope of the problem.

Quick Hits

>	Ultraplan is in research preview. Anthropic explicitly states behavior and capabilities may change. Avoid building team workflows around it until it reaches stable release.
>	The cloud session uses Opus 4.6, not Sonnet. That appears intentional: migrations across 40+ files require the highest reasoning capacity, and cloud infrastructure absorbs the cost instead of your terminal.
>	Early users report roughly 2x faster plan generation compared to local planning on migration-style tasks. The multi-agent variant appears stronger at auditing blast radius and identifying risk across multiple services.
>	The "teleport back" option archives the cloud session immediately. It does not keep working in parallel. If you want to return to your previous terminal conversation later, Claude prints a `claude -resume` flag at the top of the new session so you can restore context.
>	The A/B variant you receive is not documented anywhere. Anthropic assigns it at session start. You cannot request a specific variant. If you want to know which one ran, check the plan structure: a multi-agent output tends to flag conflicting approaches across sections, while simple_plan reads more like a linear walkthrough.

The Take

Anthropic is not competing with OpenClaw on feature parity. They are redefining where agentic execution lives. By moving planning into the cloud runtime, they own the infrastructure, the A/B run data, and the upgrade path. Every Ultraplan run is a training signal for the next planning variant.

The workflow separation (plan in cloud, execute wherever you want) is the architecture that matters. It is not a UX improvement. It is a platform boundary being drawn.

The more interesting question is what happens 12 months from now when Anthropic has run Ultraplan across hundreds of thousands of real codebases. They will have a dataset of what good plans look like across migrations, refactors, and feature builds. That dataset trains the next model. The engineer who reviews the plan in the browser is, without knowing it, labeling training data.

The Open Question

If Ultraplan's A/B variants are improving through production runs, how long before Anthropic distills that data into a planning-specific model? And if they do, what happens to the multi-agent variant that currently costs Opus 4.6 tokens per run?

Reply to this email. I read every response.

Coming Up

Why Google's KV cache rotation trick cuts memory 90% without touching model weights

ResearchAudio.io | researchaudio.io

Sources: code.claude.com/docs, claudefast.io, devops.com, dev community

Anthropic Banned OpenClaw.Then Shipped Ultraplan.