| ResearchAudio.io · Issue #47 · May 4, 2026 |
6 min read |
|
An open-source spec just made OpenAI’s pull requests jump 500%.
The bottleneck was never the model. The control plane shifted to Linear.
|
|
500%
Landed pull requests in 3 weeks
|
3–5
Codex sessions one engineer can manage
|
~1000
Lines in the entire spec
|
|
|
The Lead
|
|
Six months ago, three engineers at OpenAI ran an experiment. Build an internal productivity tool with no human-written code. Every line generated by Codex.
It worked. Then they hit a wall they did not see coming.
Each engineer could comfortably manage three to five Codex sessions at once. Past that, productivity dropped. They forgot which session was doing what. They jumped between terminals to nudge agents. They debugged stalled tasks for hours.
The agents were fast. Human focus was the system bottleneck.
So they stopped supervising agents and started supervising tickets. The schedulable unit changed.
|
|
The Schedulable Unit Just Changed
|
Old World
Prompt = schedulable unit
|
|
Devs paste prompts into chat, supervise sessions
|
|
the bottleneck moves from writing to reviewing
→
|
|
New World
Ticket = schedulable unit
|
|
— — — —
— — — —
— — — —
|
→ |
Codex via Symphony
|
→ |
PR
PR ✓
|
Tickets flow autonomously to agents
|
|
based on OpenAI engineering, April 27, 2026
|
|
The Mechanism
|
|
Symphony watches the issue tracker on a fixed cadence. Every open ticket gets a dedicated workspace. A Codex agent runs in that workspace continuously, opens the pull request, drives it through continuous integration, and stops at a handoff state like Human Review. If the agent crashes, Symphony restarts it. If new work shows up, Symphony picks it up.
The control plane shifted. From the editor to the issue tracker. Linear became the durable database. Symphony itself is stateless.
Three things make this scale where session-based agents do not. First, a ticket can represent much larger work than a single PR. The agent can analyze the codebase, propose a plan, then break the work into a tree of dependent subtasks. Second, agents file new tickets themselves when they spot improvements during implementation. Third, the workflow rules live in a versioned markdown file inside the repo, the same way you version code.
|
|
Symphony Orchestration Topology
|
|
→ |
|
→ |
Symphony
spec layer ~1000 LoC
stateless
|
→ |
Codex Agent workspace 1
Codex Agent workspace 2
Codex Agent workspace 3
|
|
↩ pull requests and state flow back to Linear, closing the loop
|
|
|
OpenAI’s 500% pull request boost did not come from a smarter model. It came from one markdown file.
|
|
|
The Application
|
|
You do not need to wait. The whole orchestrator is two markdown files in your repo: a spec that defines the system and a workflow file that captures your team’s policy. The reference implementation is open-source on GitHub at 15K+ stars two weeks after release. Codex built the Elixir version in one shot. The team then asked Codex to reimplement it in TypeScript, Go, Rust, Java, and Python to surface ambiguities. Six languages, same spec, same behavior.
This week, write your team’s workflow file. Three sections. Active states (which tickets count as work). Hooks (what runs after workspace creation, before agent dispatch, after PR open). Handoff (where the agent stops, where the human starts). Pilot it on one repo. Measure landed pull requests at week three.
Symphony is also dramatically smaller than the agent frameworks teams have been gluing together for the past year. The spec is the artifact, not the runtime.
The starter workflow file and triage heuristic I drafted while researching this piece are in the paid ResearchAudio archive.
|
|
Symphony vs Agent Frameworks
| Metric |
Symphony |
LangGraph |
CrewAI |
AutoGen |
| State store |
Linear (tracker) |
Checkpointer DB |
In-process |
Conversation log |
| Runtime |
Stateless daemon |
Python graph |
Python crew |
Python chat |
| Lines of code |
~1000 |
~50k |
~30k |
~80k |
| License |
Apache 2.0 |
permissive |
permissive |
permissive |
| Schedulable unit |
Ticket |
Function/node |
Role |
Prompt |
spec is the artifact, not the runtime
|
|
The Take
|
|
The harness is the moat. The model is the commodity. Anthropic, OpenAI, and Google all ship strong coding agents. The teams that pull ahead are the ones who treat their codebase as a teammate handoff: tests on every PR, machine-readable docs, explicit guardrails, and a versioned workflow file. Symphony is the strongest evidence yet that the next 18 months of agent leverage will be won at the orchestration layer, not the model layer.
There is a second layer to read here. OpenAI is following Joel Spolsky’s playbook: commoditize your complement. The strongest position for a model lab is to make the layer above the model abundant, standardized, and open. The model stays the priced layer in the stack.
Here is the part nobody is talking about: this changes who can ship. OpenAI’s product managers and designers now file feature requests directly into Symphony and get back a review packet with a video walkthrough of the working feature. They never check out the repo. They never open a Codex session. They describe the feature, the agent ships it.
|
|
OpenAI’s Strategic Stack Move
| Human Review |
|
| Issue Tracker (Linear) |
|
| Editor |
|
| Symphony orchestration spec |
|
| Model layer (Codex) |
|
| Compute |
|
OpenAI commoditizes the layer above its core product.
Joel Spolsky principle:
commoditize your complement
|
|
|
Quick Hits
|
| ◆ 15K GitHub stars in two weeks. Linear founder Karri Saarinen flagged a public spike in workspaces created the day Symphony shipped, suggesting adoption outside OpenAI is moving fast. |
| ◆ Codex App Server, a headless mode that exposes Codex over a clean programmatic interface, made the orchestration possible. Terminal sessions and tmux do not scale to dozens of parallel agents. A clean protocol does. |
| ◆ Agents create their own work. During implementation they spot performance issues, refactoring opportunities, better architectures, and file new tickets. Most follow-ups get picked up by other agents. |
| ◆ The same spec was reimplemented in six languages by Codex. Each reimplementation surfaced new ambiguities, which the team used to simplify the spec. A spec that survives translation is the gold standard for technical writing. |
|
|
The Open Question
|
|
What is the right ratio of interactive Codex sessions to Symphony-managed tickets on a healthy team? OpenAI says some tasks still need interactive sessions for ambiguous, judgment-heavy work. But how do you triage a ticket as routine versus exploratory before the agent starts? Reply with the heuristic your team uses, and I will share the strongest ones in next week’s issue.
|
|
Next week
Why Anthropic’s Skills launch is the quiet counter-move to Symphony, and what changes when the agent ships its own playbook.
|
|
Source: An open-source spec for Codex orchestration: Symphony, OpenAI engineering, April 27, 2026.
|
|
Written by Deep, ResearchAudio.io. One paper a week, the part the field is sleeping on.
|
|
You receive this because you subscribed at ResearchAudio.io. Manage preferences from the Beehiiv portal.
|