The Future of AI in Marketing. Your Shortcut to Smarter, Faster Marketing.
Unlock a focused set of AI strategies built to streamline your work and maximize impact. This guide delivers the practical tactics and tools marketers need to start seeing results right away:
7 high-impact AI strategies to accelerate your marketing performance
Practical use cases for content creation, lead gen, and personalization
Expert insights into how top marketers are using AI today
A framework to evaluate and implement AI tools efficiently
Stay ahead of the curve with these top strategies AI helped develop for marketers, built for real-world results.
|
ResearchAudio.io Databricks Trained a Search Agent with Zero Human Labels33% lower cost and 47% lower latency than frontier models. The multi-task RL recipe inside. |
|||||||||||||||||||||||||||||||||||||||||
|
Databricks built a search agent that matches Claude Opus 4.6 on enterprise knowledge tasks, at 33% lower cost per query and 47% lower latency. The model, called KARL (Knowledge Agents via Reinforcement Learning), was trained entirely on synthetic data the agent generated itself, with no human labeling required. The training cost: a few thousand GPU hours. The catch? It generalizes to tasks it was never trained on.
The Problem with Enterprise Search AgentsEnterprise knowledge tasks (searching internal documents, cross-referencing information, aggregating facts from meeting notes) are fundamentally different from math or coding. They are hard-to-verify: there is often no single correct answer, and the information needed is scattered across dozens of noisy, unstructured documents. Most existing "deep research" agents rely on public web search and black-box tools. It remains unclear whether those results generalize to proprietary, enterprise data. Meanwhile, benchmarks like HotpotQA or BrowseComp only capture a narrow slice of real-world search behaviors. Databricks wanted to answer a harder question: can a single agent master multiple types of grounded reasoning at once? How KARL Works: Three Core Components1. KARLBench: six search regimes in one benchmark. Rather than testing on a single task, Databricks built KARLBench to evaluate six distinct search capabilities: constraint-driven entity search (BrowseComp-Plus), cross-document report synthesis (TREC-Biogen), tabular numerical reasoning over financial reports (FinanceBench), exhaustive entity retrieval (QAMPARI), procedural reasoning over technical docs (FreshStack), and fact aggregation over internal company meeting notes (PMBench). The agent is restricted to a single tool, vector search, isolating retrieval and reasoning quality from broader tool orchestration effects.
2. Agentic data synthesis (no humans needed). KARL generates its own training data through a two-stage pipeline. In Stage I, a synthesis agent explores a document corpus via vector search and proposes grounded question-answer pairs. A deduplication agent then filters out any overlap with the evaluation set. In Stage II, multiple solver agents independently attempt each question. Questions where every solver succeeds (too easy) or every solver fails (too hard or broken) are discarded. Only questions in the "learning sweet spot," where some solvers succeed and others fail, survive to become training data. 3. OAPL: off-policy RL that actually scales. Standard RL for language models (like GRPO) assumes the model generating training data and the model being updated stay in sync. In distributed training, they never do. Previous fixes (clipped importance weighting, data deletion) introduced instability. Databricks developed OAPL (Optimal Advantage-based Policy Optimization with Lagged Inference policy), which embraces off-policyness by design. Think of it as a regression objective that fits the model toward the optimal policy, rather than fighting the lag between data generation and training. OAPL remains stable even when the model generating rollouts is more than 400 gradient steps behind the model being trained. That is roughly 100x more off-policy than previous approaches tolerated. In code generation experiments, OAPL matched GRPO using approximately 3x fewer training samples. Two Findings That Stood OutMulti-task RL generalizes; single-task RL does not. KARL-TREC (trained only on TREC-Biogen) scored 85.0 on its target task but failed to transfer to BrowseComp-Plus. KARL-BCP (trained only on BrowseComp-Plus) reached 59.6 on its task but similarly failed on TREC-Biogen. Training on both tasks simultaneously, KARL matched or improved performance on each while also generalizing to four held-out tasks it had never seen during training. The agent learned to compress its own context, end-to-end. Some KARLBench tasks require over 200 sequential vector database queries, exhausting the context window many times. Rather than training a separate summarization model, the team included compression as part of the RL training loop. The agent learned what to keep and what to discard, guided only by the final task outcome. Removing this learned compression dropped accuracy on BrowseComp-Plus from 57% to 39%.
Key Insights
The key takeaway here is not about KARL itself, but about the recipe. Databricks is now making these same RL pipelines to customers for building custom agents on their own enterprise tasks. If a few thousand GPU hours and zero human labels can produce a model that is Pareto-optimal against frontier systems on six search tasks, the barrier to building domain-specific agents just dropped considerably.
|
|||||||||||||||||||||||||||||||||||||||||


