AI Agents Are Reading Your Docs. Are You Ready?
Last month, 48% of visitors to documentation sites across Mintlify were AI agents—not humans.
Claude Code, Cursor, and other coding agents are becoming the actual customers reading your docs. And they read everything.
This changes what good documentation means. Humans skim and forgive gaps. Agents methodically check every endpoint, read every guide, and compare you against alternatives with zero fatigue.
Your docs aren't just helping users anymore—they're your product's first interview with the machines deciding whether to recommend you.
That means:
→ Clear schema markup so agents can parse your content
→ Real benchmarks, not marketing fluff
→ Open endpoints agents can actually test
→ Honest comparisons that emphasize strengths without hype
In the agentic world, documentation becomes 10x more important. Companies that make their products machine-understandable will win distribution through AI.
Pets to Cattle. 90% Tail Latency. Anthropic's Agent Math.
Brain, hands, session. Three decouplings, one public rewrite.
|
3
decouplings
|
5
core interfaces
|
90%+
p95 TTFT drop
|
Anthropic cut p95 time-to-first-token on their hosted agent service by more than 90 percent. Not with a new model. Not with a new GPU. They did it by refusing to provision containers speculatively, and by rewriting their own agent harness from coupled to decoupled, in public.
The first version of Managed Agents coupled the harness, session, and sandbox in one container. When a container died, the session died with it. Engineers debugging stuck sessions had to open a shell inside the container, which meant touching user data. The team called it adopting a pet.
So they rewrote the whole thing. Claude Managed Agents launched in public beta on April 8, 2026, and the engineering post published alongside it is the better read than the product page.
The Problem: Harnesses Go Stale
Harnesses encode assumptions about what Claude cannot do on its own. Those assumptions go stale as models improve.
Anthropic gives a concrete example. In earlier work, Claude Sonnet 4.5 would wrap up tasks prematurely as it sensed its context limit approaching, a behavior the team called "context anxiety." They added context resets to the harness to fix it.
When they tested the same harness on Claude Opus 4.5, the behavior was gone. The resets had become dead weight.
So they built Managed Agents around interfaces meant to outlast any particular harness implementation. The team describes it as a "meta-harness": a hosted service with stable interfaces that can accommodate whichever specific harness Claude needs in the future, including the ones Anthropic runs today.
V1: Don't Adopt a Pet
The first implementation coupled everything. The session log, the agent harness, and the sandbox all shared one container. File edits were direct syscalls. There were no service boundaries to design.
It looked clean.
It was a pet. In the pets-versus-cattle analogy from cloud infrastructure, a pet is a named, hand-tended server you cannot afford to lose. Cattle are interchangeable.
The V1 container was a pet: if it failed, the session was lost. If it was unresponsive, engineers had to nurse it back to health.
Debugging was nearly impossible. Their sole window into a stuck session was the WebSocket event stream, which could not distinguish between a harness bug, a dropped packet, and a container going offline. Opening a shell inside the container meant touching user data, so the team effectively had no way to debug. And because the harness assumed all work lived inside its container, any customer who wanted Claude to reach resources in their own VPC had to either peer their network with Anthropic's or run the harness themselves.
|
V1: the pet
One container. Everything coupled.
container
brain + harness
session log
sandbox
credentials
container dies → session dies
|
→ |
V2: cattle
Decoupled. Each part replaceable.
brain stateless harness
↕ execute()
hands sandboxes, tools
↕ getEvents()
session durable event log
vault credentials, out
any part dies → replace it
|
How It Works: Three Decouplings
1. Kill the container, keep the session. In V2, the harness calls containers the way it calls any other tool: execute(name, input) returns a string. The container becomes cattle.
If a container dies, the harness catches the failure as a tool-call error and passes it back to Claude. A new container can be reinitialized with provision({resources}).
The harness itself is also stateless. When one crashes, a new one reboots with wake(sessionId), pulls the event log via getSession(id), and resumes from the last event.
2. The event log is the memory. Long-horizon tasks exceed Claude's context. Standard fixes like compaction and trimming all make irreversible decisions about what to keep.
Managed Agents stores the session as a durable event log sitting outside the context window. The getEvents() interface lets the brain interrogate past context by selecting positional slices of the stream.
The brain can rewind before a specific moment, reread context before a key action, or resume from wherever it last stopped reading. The harness transforms fetched events before passing them into Claude's context window, enabling prompt-cache optimization without losing the raw record.
3. Claude never touches the tokens. In the coupled design, untrusted code generated by Claude ran in the same container as credentials. A prompt injection simply needed to convince Claude to read its own environment to harvest tokens. Once an attacker had those, they could spawn fresh unrestricted sessions and delegate work to them.
The structural fix was to make tokens unreachable from the sandbox. For Git, each repository's access token is used once to clone the repo during sandbox initialization, then wired into the local git remote. Push and pull work from inside the sandbox, but the agent never handles the token.
For custom tools, OAuth credentials live in a secure vault outside the sandbox. Claude reaches them through a dedicated MCP proxy that receives a session token, fetches the corresponding credentials, and makes the external call. The harness is never made aware of any credentials.
The Payoff Is In The Tail
In the coupled design, every session paid the full container setup cost before the first inference token, even if the session never touched the sandbox. Every brain needed its own container. Every session cloned repos, booted processes, and fetched pending events upfront.
Decoupling means containers get provisioned lazily, when Claude actually calls a tool. A session that does not need a sandbox right away does not wait for one. Inference can start as soon as the orchestration layer pulls pending events from the session log.
Anthropic reports p50 TTFT dropped roughly 60 percent. P95 TTFT dropped over 90 percent. The p95 number is the one that matters: it is the tail latency users actually feel.
Key Insights
Harness assumptions go stale, so design for "programs as yet unthought of." The hardest problem here was not any specific optimization. It was building interfaces that would outlast the models running behind them.
Anthropic borrowed the OS analogy directly: process, file, read(). Those abstractions outlived the disk packs they were built for. Managed Agents is attempting the same trick for agents.
Pets versus cattle is the right mental model for agent infrastructure. If your agent system has a component that cannot be killed and replaced without losing state, you have a pet.
Session logs belong outside the harness. Sandboxes belong outside the brain. Tokens belong outside the sandbox. Every coupling is a future outage waiting to happen.
Credentials belong outside the sandbox, always. Narrow scoping of tokens is a mitigation, not a fix. It encodes an assumption about what Claude cannot do with scoped credentials, and that assumption erodes as models improve.
The structural fix is making credentials unreachable from code Claude generates. Anthropic's Git pattern (clone with the token, then wire it into the remote so the agent never touches it) is worth copying.
The 90 percent tail-latency win came from refusing to provision speculatively. The improvement did not come from a faster container runtime or a smaller base image. It came from not provisioning containers until Claude called a tool.
Sessions that did not need compute stopped paying for it. Lazy provisioning is a textbook systems technique, but in agent infrastructure it had been obscured by the habit of coupling state to compute.
Quick Hits
Early customers already in production. Notion runs dozens of parallel coding, slides, and spreadsheet tasks inside a single workspace. Rakuten deployed specialist agents across five business functions including product, marketing, and finance, each live in under a week. Sentry built an agent that goes from flagged bug to opened pull request with no human in the loop.
Pricing, decoded. Standard Claude token rates plus $0.08 per session-hour for active runtime, measured in milliseconds. Idle time (waiting for user input or a tool response) does not count toward runtime. Web search adds $10 per 1,000 searches on top.
Two features in research preview. Multi-agent coordination (one brain spinning up and directing others) and automatic outcome iteration (Claude refining its own response until a quality threshold hits). Anthropic reports the iteration feature improved structured file generation success by up to 10 points over standard prompting in internal testing.
Many brains, many hands. Because the hand interface is uniform (execute(name, input) returns a string), the harness does not care whether the sandbox is a Linux container, a phone, or, in Anthropic's own words, a Pokémon emulator. And because no hand is coupled to any brain, brains can pass hands to each other.
The Take
The headline everyone is reading is "10x faster deployment." The headline worth reading is that the first version was wrong, and the team said so in public. Most product launches hide the rewrite. Anthropic led with it.
If your agent system has a component that cannot be killed and replaced without losing state, you have a pet. Every coupling is a future outage waiting to happen.
Walk through each stateful piece of your harness this week and ask the question. Move the session log out of the harness first. Move credentials out of the sandbox second. Everything after that is optimization.
The paid archive has the full interface walkthrough (execute, getEvents, emitEvent, wake, provision) with copy-paste scaffolding patterns for migrating a coupled harness to a decoupled one.
The Open Question
The multi-agent coordination feature (one brain spawning and directing other brains) is still in research preview. The hard problem is not spawning agents. It is the coordination protocol: how brains decide which hand to use, how state gets shared across sessions, how conflicts get resolved.
Anthropic has not published that part yet. Whoever figures out the coordination primitives first will define the next layer of the stack.
Anthropic rewrote their own agent harness in public. The interfaces matter more than the implementation: brain, hands, session.
Next week: the ICLR 2026 paper that compressed KV cache memory using a math trick from 1984.


