In partnership with

You think 4x faster than you type. Your IDE should keep up.

Wispr Flow lets you dictate prompts, acceptance criteria, and bug reproductions inside Cursor or Warp — with automatic file name and variable recognition. Say user_id, get user_id. Say useEffect, get useEffect.

Paste directly into GitHub, Jira, or Linear. Give coding agents the full context they need without typing a novel.

89% of messages sent with zero edits. Millions of developers use Flow daily, including teams at OpenAI, Vercel, and Clay. Free on Mac, Windows, and iPhone.

Start flowing free

ResearchAudio.io · Architecture

Anthropic's hypothesized architecture for Claude Mythos

Four papers, one repo, sixteen loops of the same block.

770M

looped, matches 1.3B

papers in 12 months

16×

loop iters per forward pass

Two things converged this spring. A research direction that had been quietly building for a year became hard to ignore, and a community reconstruction handed everyone a runnable version of the architecture under question. Both pointed at the same primitive: same transformer block, looped many times per forward pass, with reasoning depth set at inference rather than baked into parameter count.

The research thread runs through four papers. Geiping et al. trained Huginn-3.5B in early 2025, the first widely studied recurrent-depth language model. Lu et al. probed it in July and reported that the latent reasoning story is not yet supported by the internal evidence. Knupp et al. published the Dreamer framework in January 2026 with proper compute-matched baselines. And Chen reported a clean computational frontier on compositional generalization in March. Four papers, twelve months, one architectural shape.

The community thread is Kye Gomez's OpenMythos. Published mid-April. 10,600 stars and 2,400 forks within three weeks. The repo does not contain Claude Mythos weights, because nobody outside Anthropic has them. What it contains is a falsifiable hypothesis, written as PyTorch: that Mythos is a recurrent-depth transformer that loops the same block up to sixteen times per forward pass, with a mixture-of-experts feed-forward and a compressed key-value scheme.

The headline scaling claim is the one that ties both threads together. At 770M parameters, a looped model reaches the downstream quality of a 1.3B fixed-depth transformer trained on the same data. Roughly half the parameters, the same answers. You pay for it at inference time in compute, not in model storage.

How the architecture is wired

📝

Prelude

standard blocks
runs once

→

🔄

Recurrent Block

one block, looped T times
T ≤ 16

→

🎯

Coda

standard blocks
runs once

↰ loops, re-injecting e every step

h_t+1 = A·h_t + B·e + Transformer(h_t, e)

Source: OpenMythos repo, github.com/kyegomez/OpenMythos

OpenMythos instantiates the hypothesis as a three-part structure: Prelude, Recurrent Block, Coda. The Prelude is a small stack of standard transformer layers that runs once. The Coda is the same idea on the way out. The Recurrent Block sits between them as a single transformer block looped up to T iterations, sharing weights across every iteration.

At each loop step, the hidden state updates by the rule above. Here h_t is the hidden state after iteration t, and e is the encoded input from the Prelude, re-injected at every step. That re-injection is the trick. Without it, the hidden state drifts away from the original input across deep loops and lands in noise. The learned matrices A and B control how much of the previous state and how much of the original input carry forward at each step.

There is a hard stability constraint. For the recurrence to behave like a Linear Time-Invariant system instead of exploding, the spectral radius of A has to stay below 1. The OpenMythos example code prints ρ(A) = 0.xxxx (must be < 1) at every initialization. If it drifts past 1, the loop is unstable and the model breaks. Treating this as a first-class training primitive, rather than a footnote, is one of the things that makes the repo readable.

Key Insight

The trade in a looped model is not smaller model, same quality. It is smaller stored model, more inference compute. Whether that trade is good depends entirely on whether your bottleneck is parameters (memory, distribution, edge deployment) or FLOPs (latency, cost per request).

The research thread

Recurrent-depth is not new as an idea (Universal Transformers proposed it years ago), but four results in the last twelve months are what made it stop feeling speculative. Each one took a different angle on the same architecture, and read together they form a real research thread.

fig. 0 · how to read these → block loop / state control

01 / 06

Universal Transformer

the ancestor of the loop

token → block ↺ → halt?

stop when the token says stop.

Depth

adaptive

Stability

halt scalar

Reasoning

token

Seen in

T2T

02 / 06

Huginn-3.5B

the first scale-up

e → block × T → re-inject

same block, many times, at scale.

Depth

variable

Stability

re-injection

Reasoning

latent

Seen in

Huginn

03 / 06

Huginn, probed

the audit (Lu et al.)

h₁ h₂ h₃ lens

trajectories jump, not climb.

Depth

variable

Stability

n/a

Reasoning

limited

Seen in

2507.02199

04 / 06

Dreamer

the proper baseline (Knupp et al.)

block × E₁ E₂ E₃

match the compute, then talk.

Depth

variable

Stability

depth mixing

Reasoning

routed

Seen in

2601.21582

05 / 06

Thinking Deeper

the frontier (Chen)

block × 20+ → silent obj

near-ceiling by step twenty.

Depth

20+ steps

Stability

LayerScale

Reasoning

silent

Seen in

2603.21676

06 / 06

OpenMythos

the synthesis (Gomez)

P → R × 16 → C

770M weights, 1.3B work.

Depth

16 max

Stability

ρ(A) < 1

Reasoning

d-LoRA

Seen in

OpenMythos

The arc, in one sentence: one ancestor proposed the loop, one model proved you could train it at scale, one probing study said its internal reasoning is not what the headline implies, one framework gave it proper baselines, one paper showed it scales cleanly, and one repo wired all the threads together.

The reconstruction: OpenMythos

OpenMythos arrived in mid-April as the synthesis attempt. Not a paper, not a leaked checkpoint, just PyTorch code carrying a specific hypothesis: Claude Mythos is a recurrent-depth transformer that loops the same block up to sixteen times, with a sparse mixture-of-experts feed-forward (64 experts at the 1B scale, scaling to 512 at the 1T config) and a switchable mechanism between grouped-query and a compressed multi-latent variant. Model variants are pre-configured from 1B to 1T parameters.

The repo also addresses the obvious failure mode. If more loops produce better answers, why not run a hundred? Because beyond a certain depth, predictions degrade. The hidden state drifts past the right answer and into noise. OpenMythos handles this with adaptive computation time halting, a learned scalar per token position that decides when to stop looping. Tokens that have converged halt early. Tokens that are still uncertain keep iterating. You pay compute where you need it.

The second addition is a Depth-Wise LoRA adapter. The base block weights are shared across every iteration, but a small rank-r adaptation matrix gets injected at each loop depth. So loop step 3 looks slightly different from loop step 12, even though they share the same base block. This bridges the gap between pure weight-tying (every iteration identical) and full unrolling (every iteration its own block), and it is one of the more direct responses to the Lu et al. critique: if each iteration looks the same, of course the probing study sees discontinuous trajectories.

The Take

Both threads are pointing at the same conclusion from different ends. The research side is firming up the case that depth-recurrence is a real scaling axis, with proper baselines (Dreamer) and a clean generalization signal (Chen's computational frontier). The community side is building runnable scaffolding that combines four research threads in one codebase: looped transformers, continuous-latent reasoning, fine-grained mixture-of-experts routing, and compressed key-value compression. Whether Anthropic's Mythos actually looks anything like OpenMythos, the convergence is the story.

The deeper bet is on inference-time compute as a serious scaling axis. If recurrent depth holds up at scale, the cost structure of frontier models shifts. You ship a smaller checkpoint that runs longer on hard prompts. Storage costs less. Distribution gets easier. Inference gets more expensive, but you control how much per request, per token, per position. The Lu et al. probing study is the caveat: this story holds together if the loops are doing something different at each step, and that part is still unresolved.

Four papers and one repo are pointing at the same architecture. The next generation of frontier models will not be deeper. They will be loopier.

The Open Question

If the same block runs sixteen times in a row, does each iteration learn meaningfully different behavior? Or does the block collapse toward a fixed function that does roughly the same thing at every step? Depth-Wise LoRA is supposed to differentiate the iterations, but at low rank it is soft pressure, not a hard constraint. The probing studies on Huginn-3.5B suggest the answer is messy: hidden state trajectories are not monotonically improving toward the right answer, they jump around.

If you have a clean way to measure whether loop iteration t is doing something different from loop iteration t+1, reply to this email. I want to read it.

Next Week

The probing study that says Huginn's latent reasoning is mostly illusion. Where the rank trajectories actually go, layer by layer, and why explicit chain-of-thought still wins on arithmetic.

ResearchAudio.io · AI research, audited and explained.

Research thread: Lu et al. (2025) · Knupp et al. (2026) · Chen (2026)

Reconstruction: github.com/kyegomez/OpenMythos

Anthropic's hypothesized architecture for Claude Mythos

You think 4x faster than you type. Your IDE should keep up.

Anthropic's hypothesized architecture for Claude Mythos

How the architecture is wired

The research thread

The reconstruction: OpenMythos

The Take

The Open Question

Keep Reading

Quick Links

Stay Updated