In partnership with

Your CS team knew about the churn risk before the renewal call.

Tuesday morning, #cs-team. Viktor posted account health updates for the 12 accounts up for renewal this quarter: ticket volume, NPS shift, product adoption delta, QBR action items still open.

For one account, a note: "Three CSAT scores below 6 in 14 days, all from the same admin. Their power user moved teams in March. Recommend reaching out before Friday's renewal call."

Your CSM didn't pull a report. The AE got the same intel in #sales. Your VP CS reviewed the full renewal pipeline over coffee.

20,000+ teams. Viktor lives in Slack and Microsoft Teams. Connects to Gainsight, Zendesk, HubSpot, Salesforce, Intercom, and 3,000+ other tools.

"It was almost instantly adopted by the bulk of my team." Boris, CEO, Space Dinosaurs.

Get Started for Free

ResearchAudio.io

OpenAI Found the Bugs. Humans Read Every One.

Inside the Daybreak pipeline: judging agents, false positives, and a 23-year-old flaw.

On June 22, OpenAI detailed a security effort in which its most cyber-capable model read through millions of lines of open-source code and flagged hundreds of issues. The counterintuitive part is what came next. A human security engineer reproduced and checked every finding before it reached a maintainer.

Discovery was the easy part. Validation was the constraint.

What it is

The effort is called Patch the Planet, a Daybreak initiative built with the security firm Trail of Bits. Daybreak is OpenAI's program for pointing frontier models at defensive security. Trail of Bits committed its entire security research group to the first surge, working with maintainers to validate issues, write and test patches, and coordinate disclosure.

The targets are infrastructure that almost everything depends on. Participating projects include cURL, Python, the Go project, pyca/cryptography, Sigstore, and aiohttp. Across 19 projects, OpenAI reports the team identified hundreds of security issues and merged dozens of patches, with more still moving through coordinated disclosure.

30M+

Linux kernel lines scanned

Linux privilege-escalation exploits generated

880,000+

sites exposed by the HTTP/2 Bomb

How the pipeline works

The reusable core was a variant-discovery pipeline. The system ingests years of public vulnerability history, extracts the underlying flaw patterns, then searches target codebases for related bugs. Candidates pass through specialized judging agents that remove duplicates and filter likely false positives. The strongest evidence reaches a human for confirmation; weaker candidates are dropped. Trail of Bits found the models were most effective at exactly this kind of variant analysis, surfacing fresh instances of known bug classes.

Public flaw history

known patterns

→

Model scans code

high recall

→

Judge agents filter

dedup, false positives

→

Human confirms

reproduce, rescore

Hundreds of candidates in. Dozens of confirmed patches out. The human gate is where the funnel narrows.

Source: OpenAI Patch the Planet writeup, June 2026

A second technique was differential testing at scale. Different implementations of the same protocol should behave the same way on the same input. When they diverge, one of them likely has a bug. The hard part is normally the glue code that connects each implementation to a shared test harness, which the model generated and refined. Work that has historically taken weeks or months produced high-signal candidates within days.

What it found

The findings span the whole stack. On the Linux kernel, the model worked across more than 30 million lines of code and automatically generated 8 proof-of-concept exploits for information leaks and 24 for local privilege escalation, a subset of the hundreds of issues identified. In OpenBSD, it surfaced a 23-year-old memory-safety bug in the kernel's System V semaphore code, where memory could be used after it was released, which OpenAI confirmed could let an unprivileged local user escalate to root.

Browsers were not spared. The team reported five exploitable flaws in Chrome's V8 JavaScript engine, three of them caught and fixed within days of being introduced. Roughly a week of focused WebKit work surfaced more than 10 exploitable Safari flaws. A WebAssembly flaw found during OpenAI's safety evaluations was patched by Mozilla two days before Pwn2Own Berlin, after which five of six registered Firefox entries withdrew, and no Firefox exploit was demonstrated at the contest. Separately, the partner firm Calif used the tooling to find an HTTP/2 denial-of-service technique it called the HTTP/2 Bomb, which its analysis suggested affected more than 880,000 internet-facing sites running servers including Nginx, Apache, and Pingora.

The detail that ties it together is the human gate. Trail of Bits manually reviewed every issue before submitting it to a maintainer: reproducing the evidence, checking it against project documentation and threat models, removing duplicates, and reassessing severity. The writeup is blunt that frontier models produce a high volume of false positives, which would otherwise add to the backlog maintainers already carry. Maintainers stayed in control of which patches shipped and how disclosure was handled.

Why it matters for builders

Verification is the scarce resource. The pipeline exists because high recall comes with many false positives, so expert confirmation, not detection, is the bottleneck. If you are building any find-then-act agent, fund the verification layer with the same seriousness as the generator.

Known bugs are a search strategy. The most effective method here was variant analysis: take a fixed flaw pattern, then hunt for other instances of it across a codebase, with judging agents filtering before a human looks. Your past incidents and patches are inputs for finding the next bug.

The infrastructure outlasts the findings. Beyond the bugs, the sprint left behind fuzzing harnesses, differential-testing setups, and property tests grounded in each project's specifications. A fuzzing lab that would take several weeks to build manually was assembled in under a day, and it keeps working after the first patches land.

The takeaway for anyone shipping with frontier models is narrow and useful. As the cost of finding candidates falls toward zero, the durable advantage moves to whatever you place between the model and the irreversible action.

ResearchAudio.io

Source: Patch the Planet, OpenAI Daybreak, June 22, 2026.

OpenAI Found the Bugs. Humans Read Every One.

Your CS team knew about the churn risk before the renewal call.

OpenAI Found the Bugs. Humans Read Every One.

What it is

How the pipeline works

What it found

Why it matters for builders

Keep Reading

Quick Links

Stay Updated