Google Launches Fully Managed RAG System in Gemini API
File Search Tool handles storage, embeddings, and retrieval automatically with pay-per-indexing pricing
Bottom Line Up Front: Google launched a fully managed RAG system in the Gemini API that handles file storage, chunking, embeddings, and retrieval automatically. Storage and query-time embeddings cost nothing. You only pay $0.15 per 1M tokens for initial indexing. This changes the economics of building RAG applications for developers who want capability without infrastructure overhead.
The Problem Google Just Solved
Building a Retrieval-Augmented Generation system has always been powerful but painful. You need to manage vector databases, implement chunking strategies, handle embeddings, orchestrate retrieval, and inject context into prompts. Every developer rebuilds the same infrastructure from scratch.
Google just eliminated all of that complexity with the File Search Tool in the Gemini API. It's a fully managed RAG system that abstracts away the entire retrieval pipeline so you can focus on building applications instead of infrastructure.
What Google Announced
The File Search Tool is now available in the Gemini API with three breakthrough features:
1. Complete RAG Automation
The File Search Tool handles everything automatically:
- File storage management - Upload your documents and forget about infrastructure
- Optimal chunking strategies - Google determines the best way to split your documents
- Automatic embeddings - Powered by the Gemini Embedding model
- Dynamic context injection - Retrieved content is automatically added to your prompts
- Built-in citations - Responses include source references for verification
2. Pay-Per-Indexing Pricing Model
Google's pricing structure focuses charges on the indexing phase:
- No storage charges - Document storage has no ongoing cost
- No query-time embedding charges - Embeddings generated during searches have no cost
- $0.15 per 1M tokens for indexing - You only pay when you first upload and index your files
Cost comparison: Traditional RAG setups require paying for vector database hosting, compute for embeddings, and storage. Google's model reduces these costs significantly, particularly for applications with high query volumes.
3. Powerful Vector Search
The File Search Tool uses Google's state-of-the-art Gemini Embedding model for semantic search. This means:
- The system understands meaning and context, not just keyword matching
- It finds relevant information even when the exact words aren't used in your documents
- Supports a wide range of formats: PDF, DOCX, TXT, JSON, and common programming language files
How It Works in Practice
The File Search Tool integrates directly into the existing generateContent API. You don't need to learn new APIs or change your workflow significantly.
Code Example: Upload and Query
Here's a complete example showing how to create a file search store, upload a document, and query it:
from google import genai
from google.genai import types
import time
client = genai.Client()
# Step 1: Create a File Search store
file_search_store = client.file_search_stores.create(
config={'display_name': 'my-knowledge-base'}
)
# Step 2: Upload and import a file directly
operation = client.file_search_stores.upload_to_file_search_store(
file='technical_manual.pdf',
file_search_store_name=file_search_store.name,
config={
'display_name': 'Product Manual',
'custom_metadata': [
{'key': 'doc_type', 'string_value': 'manual'},
{'key': 'version', 'string_value': '2.0'}
]
}
)
# Step 3: Wait for indexing to complete
while not operation.done:
time.sleep(1)
operation = client.file_search_stores.get_operation(
operation_name=operation.name
)
# Step 4: Query using File Search
response = client.models.generate_content(
model='gemini-2.5-flash',
contents='How do I reset the device?',
config={
'tools': [{
'file_search': {
'file_search_store_names': [file_search_store.name]
}
}]
}
)
print("Answer:", response.text)
# Check for citations
if hasattr(response.candidates[0], 'grounding_metadata'):
print("Citations:", response.candidates[0].grounding_metadata)
Advanced: Metadata Filtering
Filter queries to specific documents using metadata:
# Query only documents tagged as 'manual'
response = client.models.generate_content(
model='gemini-2.5-flash',
contents='What are the safety precautions?',
config={
'tools': [{
'file_search': {
'file_search_store_names': [file_search_store.name],
'metadata_filter': 'doc_type="manual"'
}
}]
}
)
print("Filtered answer:", response.text)
JavaScript/TypeScript Example
The same functionality in JavaScript:
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({});
// Create store and upload file
const fileStore = await ai.fileSearchStores.create({
config: { displayName: 'my-knowledge-base' }
});
const uploadOp = await ai.fileSearchStores.uploadToFileSearchStore({
fileSearchStoreName: fileStore.name,
file: 'technical_manual.pdf',
config: {
displayName: 'Product Manual'
}
});
// Query with File Search
const response = await ai.models.generateContent({
model: "gemini-2.5-flash",
contents: "How do I reset the device?",
config: {
tools: [{
fileSearch: {
fileSearchStoreNames: [fileStore.name]
}
}]
}
});
console.log("Answer:", response.text);
Real-World Performance: Beam's Results
Beam, an AI-driven game generation platform developed by Phaser Studio, was part of Google's early access program. Their results are impressive:
- Running thousands of searches daily against a growing library of template data
- File Search handles parallel queries across all corpora, combining results in under 2 seconds
- Previous manual cross-referencing took hours
That's a 3,600x improvement (2 seconds vs hours). This is production-ready performance at scale.
Who Should Use This
Google's early access developers are already building:
- Intelligent support bots - Customer service agents that can search through documentation, FAQs, and support tickets instantly
- Internal knowledge assistants - Corporate search tools that understand company documents, policies, and procedures
- Creative content discovery platforms - Tools that help users find relevant information across large content libraries
- Technical documentation systems - Code search, API documentation, and developer tools that understand intent
How Other Providers Handle RAG
Google's fully managed approach is unique. Here's how the major providers compare:
OpenAI: Assistants API with File Search
OpenAI offers a similar managed RAG solution through their Assistants API. Key differences:
- Storage costs: OpenAI charges $0.10 per GB per day for vector storage (Google: FREE)
- Retrieval costs: OpenAI charges per retrieval operation (Google: FREE)
- Integration: Tied to the Assistants API framework vs Google's direct integration with generateContent
Anthropic: Build-Your-Own with Citation Support
Anthropic takes a different approach with Claude:
- No managed RAG API: Developers must build and host their own vector database infrastructure
- Citation support: Claude API supports search result content blocks for proper source attribution
- Projects RAG (web UI only): Claude.ai Projects automatically enable RAG when knowledge exceeds context window, but this is NOT available via API
- Philosophy: Anthropic provides tools and documentation for building custom RAG systems rather than a managed service
Anthropic's approach gives developers maximum flexibility but requires managing vector databases (Pinecone, Weaviate, MongoDB Atlas, etc.), embedding models, and retrieval logic.
Key insight: Google is betting that abstracting away RAG infrastructure will accelerate adoption, while Anthropic believes developers want control over their retrieval pipeline. OpenAI sits in the middle with managed infrastructure but higher costs.
Why This Matters
This announcement is significant for three reasons:
1. It Commoditizes RAG Infrastructure
Building and maintaining RAG systems has been a competitive advantage for companies. Google is making it a commodity feature. This accelerates the entire AI application ecosystem by removing a major technical barrier.
2. The Pricing Model Changes RAG Economics
No storage or query-time embedding charges change the cost structure of RAG applications. This pricing model particularly benefits high-query-volume applications, which have traditionally been expensive to operate. Google's approach favors developer adoption by reducing operational costs.
3. Competitive Dynamics Are Shifting
Google's aggressive pricing undercuts OpenAI's Assistants API (which charges for storage and retrieval). OpenAI will need to respond or justify the cost premium. Anthropic's build-your-own philosophy faces pressure as managed solutions become cheaper and easier. For developers, this competition means better tools and lower costs regardless of which provider they choose.
Technical Considerations
While the File Search Tool is powerful, developers should understand its design decisions:
Automatic Chunking
Google handles chunking strategies automatically. This is convenient but means you can't fine-tune chunking for domain-specific requirements. For most applications, Google's chunking will be excellent. For specialized use cases requiring custom chunking logic, you may still need a self-managed solution.
Embedding Model Lock-in
File Search uses the Gemini Embedding model. You can't swap in different embedding models or experiment with fine-tuned embeddings. This trade-off makes sense for a managed service, but it's worth noting if you need embedding customization.
Built-in Citations
The automatic citation system is valuable for verification and trust. Responses include references to specific parts of your documents. This is essential for applications where factual accuracy and source attribution matter.
How to Get Started
The File Search Tool is available now in the Gemini API. Google provides:
- Documentation: Complete technical documentation at ai.google.dev/gemini-api/docs/file-search
- Demo app: "Ask the Manual" demo in Google AI Studio (requires a paid API key) at aistudio.google.com/apps/bundled/ask_the_manual
- Remix capability: You can clone and customize the demo app to build your own applications
The Bottom Line
Google's File Search Tool makes RAG accessible to every developer by eliminating infrastructure complexity and introducing developer-friendly pricing. Free storage and query-time embeddings combined with $0.15 per 1M tokens for indexing create an economic model that favors experimentation and scale.
The 2-second response times demonstrated by Beam show this is production-ready technology. For developers building support bots, knowledge assistants, or any application that needs to search through documents intelligently, this is the easiest path to RAG.
The competitive implication: Google is undercutting OpenAI's pricing while offering comparable managed RAG capabilities. OpenAI charges for storage and retrieval; Google makes both free. Anthropic's approach of letting developers build custom RAG systems offers flexibility but faces pressure as managed solutions become cheaper and easier. For developers, this competition delivers better infrastructure at lower costs across the entire AI ecosystem.
Key Takeaways
- Google launched a fully managed RAG system in the Gemini API that handles storage, chunking, embeddings, and retrieval automatically
- Storage and query-time embeddings have no cost. You only pay $0.15 per 1M tokens for initial indexing
- Built-in citations provide automatic source attribution for verification
- Beam demonstrated 2-second response times across thousands of daily searches, replacing hours of manual work
- Google undercuts OpenAI's Assistants API pricing (which charges for storage and retrieval). Anthropic requires developers to build custom RAG infrastructure
- Code examples in Python and JavaScript make implementation straightforward with minimal boilerplate
- This commoditizes RAG infrastructure, accelerating AI application development across the ecosystem
ResearchAudio.io - AI Research Newsletter
All information verified against official sources. Zero speculation, 100% factual.
You're receiving this because you subscribed to ResearchAudio.io
The Simplest Way to Create and Launch AI Agents and Apps
You know that AI can help you automate your work, but you just don't know how to get started.
With Lindy, you can build AI agents and apps in minutes simply by describing what you want in plain English.
→ "Create a booking platform for my business."
→ "Automate my sales outreach."
→ "Create a weekly summary about each employee's performance and send it as an email."
From inbound lead qualification to AI-powered customer support and full-blown apps, Lindy has hundreds of agents that are ready to work for you 24/7/365.
Stop doing repetitive tasks manually. Let Lindy automate workflows, save time, and grow your business

