Sponsored by

What’s next is almost here.

On July 16th at 1PM ET, beehiiv is going live with a look at the future of publishing, audience growth, and digital business.

What started as a newsletter platform has evolved into something much bigger: a place where creators and brands can grow, monetize, and own their audiences without stitching together half the internet to make it work.

The next chapter starts live at the Summer Release Event

Join us to see what’s coming next.

Meta’s Brain2Qwerty v2: more data, real-time, and still a long way from a patient Meta's Brain2Qwerty v2: more data, real-time, and still a long way from a patient Brain2Qwerty v2: 9 healthy volunteers, 39% mean WER, 0 patients tested. The numbers, audited.

researchaudio.io  ·  Issue 14  ·  2026-06-30

Headline comparison

v1: 40% mean.
v2: 61% mean.

The progress is real. The product is not.

Meta’s Brain2Qwerty v2: more data, real-time, and still a long way from a patient

The headline “non-invasive brain-to-text” sounds patient-ready. The numbers are not.

Brain2Qwerty v2 is the second release from the same Meta / BCBL team behind v1 (the v1 paper just landed in Nature Neuroscience). v2 collects 10× more data per participant, runs in real time, and uses a three-module hierarchical decoder (Conformer encoder → word-level Aligner → LLM sentence rewriter). On those merits alone, it is a genuine engineering step.

But the framing matters. The blog and project page lead with “10× more data” and “78% best-participant word accuracy.” They do not lead with the 39% word error rate. They do not lead with the 9 healthy volunteers, the fridge-sized MEG scanner, the EEG results that were dropped from the v2 narrative, or the missing patient cohort.

That’s the story.

Section

What Meta says

The official line: “Brain2Qwerty v2 decodes complete and meaningful sentences solely from MEG signals of healthy volunteers, and reaches up to 78% word accuracy for the best participant” (project page, last-modified 2026-06-30). The blog leans on three claims: 10× more training data per person, real-time (online) inference with no keypress timing required, and a hierarchical architecture that decodes letters, words, and sentences jointly.

All three are true. None of them are the part the patient cares about.

Section

What the numbers actually show

Metric v1 v2
Typed sentences per participant ~2,200 ~22,000 (10×)
Healthy volunteers 35 9
Word error rate (mean) ~60% 39%
Word accuracy (mean) ~40% 61%
Best-participant word accuracy 48% 78%
Real-time inference No (needs keypress timing) Yes
Patient cohort 0 0

Two things to notice.

First, v2 is genuinely better on the headline accuracy metric, both at the mean (61% vs 40% word accuracy) and at the best-participant ceiling (78% vs 48%). The 10× data increase appears to have moved the needle.

Second, the ceiling is still 78% for the best person, with 39% mean WER across only 9 healthy adults, all sitting inside a MEG machine. The bar for “useful” assistive communication is not “decodes three out of four words.” It is “decodes most of them, reliably, on someone who cannot type.” We are not there.

Word accuracy (mean, v1 vs v2)
v1   [████████                                        ] 40%
v2   [████████████                                    ] 61%

Best-participant word accuracy
v1   [█████████                                       ] 48%
v2   [███████████████                                 ] 78%

The bar a patient actually needs (rough)
    [████████████████████████████] ~95%+

Section

How it actually works

In plain English: someone wears a helmet full of magnetic sensors, types a sentence they just heard, and a model guesses the words from the shape of the brain signal alone, while they type, in real time.

The deeper version, in three parts:

  1. Conformer encoder reads the MEG signal at every keypress and outputs character probabilities. This is the v1-style “what finger just moved” signal.
  2. Word-level Aligner clusters those characters into word embeddings, so the model reasons at the word level, not just the letter level.
  3. Character-level language model (a fine-tuned LLM) rewrites the noisy character stream into a clean English sentence. This is why the model can output “the robot moves very fast” when the raw character stream was something like “tha robut mooves vary fast.” The LLM is doing the cleanup, the same way your phone’s keyboard fixes typos.

The new piece in v2 is the pipeline runs asynchronously: it does not need to know when a key was pressed. It watches a continuous MEG recording and emits characters as they appear. That is the engineering contribution. The accuracy bump is a data contribution.

Two honest caveats inside the paper itself:

  • v2 is worse at character-level decoding than the Encoder alone (CER 0.31 vs 0.28). The LLM is producing semantically clean sentences by smoothing out hard character-level errors, which means the model is editing its own output. That is impressive, and it is also the part of the result that does not generalize.
  • For the worst subject, v2’s output is a coherent but entirely different sentence. The paper says so plainly. The decoder can produce a sentence that means the right thing, but is not the sentence the person typed. For a communication aid, that is a category of error you do not want.

Section

Where it works / where it collapses

Where it works

  • Decodes full English sentences from non-invasive MEG, which v1 could not do in real time.
  • Scales with data: the paper reports a log-linear improvement in accuracy as training data grows, with no plateau yet.
  • Code is open (458 stars, 57 forks as of 2026-06-30). Anyone can re-run the v1 pipeline.
  • For the best participant, 78% word accuracy on a real-time MEG stream is a real result.

Where it collapses

  • 9 healthy volunteers. None of them have a condition that prevents them from typing. The clinical population is unstudied.
  • MEG is a superconducting helmet the size of a small fridge, kept at liquid-helium temperatures, available in maybe a hundred research centers worldwide. “Non-invasive” describes the surgery, not the device.
  • 39% mean WER means roughly 4 of every 10 words are wrong, on the good day, on the best subjects, in a research setting.
  • The model can output the wrong sentence confidently. Worst-subject case: “coherent but entirely different sentence” (the paper’s own language).
  • v2 dropped EEG from the main narrative. v1 had EEG at 67% CER. Wearable EEG is the actual mass-market non-invasive sensor. v2 is not on that sensor.

“For the best subject, the model produces either perfect or near-perfect decoding. For the worst subject, the output can be a coherent but entirely different sentence.”

From the Brain2Qwerty v2 paper

Section

Qualitative failures

No community signal worth reporting this week (HN thread: 10 points, 0 comments). Failure modes below are from the paper itself.

  1. Subject collapse. 9 healthy adults, 22,000 sentences each, 10 hours of recording. The paper’s “best participant” is one of nine. There is no result for a person with motor impairment.
  2. Modality collapse. v2 is MEG-only. v1 ran on EEG too, with a 67% CER. EEG is the only sensor that fits the “non-invasive, accessible” pitch. v2 abandons it.
  3. Error-shape failure. The LLM cleanup produces semantically right but lexically wrong sentences. For casual reading that is fine. For a person trying to say “call my daughter, not my son,” it is not.
  4. Hardware failure. The MEG scanner is a clinical-grade installation. “Non-invasive” describes the absence of surgery, not the absence of friction. Most patients will never sit in one.
  5. Generalization failure. All v2 sentences are English, from a corpus participants heard and then typed. The model decodes that task. It does not decode free composition.

Section

What this means for

Junior engineer

The interesting part is the LLM cleaning up a noisy CTC stream. That pattern (encoder → LM rewriter) generalizes to speech, handwriting, sign-language gloves, anything with a noisy character stream and a high-level language model available. Worth re-implementing on your own noisy-signal dataset to see the lift.

Senior engineer

The “log-linear improvement with no plateau” claim is the one to watch. If it holds at 100k sentences, the gap to invasive BCIs gets uncomfortable for the implant folks. The right next move is replication, not architecture: a second lab, a second language, a second scanner.

Hiring manager

The skill profile this project actually rewards: signal processing fundamentals (the Conformer is doing real work), CTC training (a quietly rare skill), LLM fine-tuning for sequence cleanup, and the patience to collect 10 hours of brain data per person. That is not a job description you will find on a job board. You will find the people at BCBL, Meta FAIR, and a handful of academic MEG labs.

Founder

Do not build a startup on this. The hardware will not be wearable in 2026. The patient data does not exist. The most defensible move is to build the LM-cleanup layer for adjacent noisy-signal products (silent-speech interfaces, electromyography keyboards) and wait for the MEG hardware to catch up.

Section

The metric that actually matters

The number Meta leads with:                  78%
The number Meta buries:                      39% mean WER
The number that has to hit for clinical use:  ~95% word accuracy
                                              at 60+ wpm, on patients
The number of patients tested:               0

The v1 → v2 jump is real. The framing is also real. v2 is a research artifact, not a communication aid, and the gap between those two things is the only number worth tracking.

If you share one thing

9 healthy volunteers, 39% mean WER, 0 patients tested. That is the gap between “non-invasive brain-to-text” and a product.

Closing

The progress is real. The product is not. Don’t confuse the curve for the destination.

Reader challenge

Three questions to sit with this week

  1. If the LLM rewriter is doing 30% of the decoding work, what does “Brain2Qwerty decoded a sentence” even mean?
  2. What is the smallest wearable MEG unit you would trust to type a sentence to your spouse?
  3. Would you publish a 78% accuracy number on a clinical communication aid?

Next issue

What the Nature Neuroscience acceptance of v1 actually implies for non-invasive BCIs, and why the regulatory path is the harder half of the story.

-- researchaudio.io

Keep Reading