Google just undercut OpenAI's RAG pricing by 90%

In partnership with

Google Launches Fully Managed RAG System in Gemini API

File Search Tool handles storage, embeddings, and retrieval automatically with pay-per-indexing pricing

Bottom Line Up Front: Google launched a fully managed RAG system in the Gemini API that handles file storage, chunking, embeddings, and retrieval automatically. Storage and query-time embeddings cost nothing. You only pay $0.15 per 1M tokens for initial indexing. This changes the economics of building RAG applications for developers who want capability without infrastructure overhead.

The Problem Google Just Solved

Building a Retrieval-Augmented Generation system has always been powerful but painful. You need to manage vector databases, implement chunking strategies, handle embeddings, orchestrate retrieval, and inject context into prompts. Every developer rebuilds the same infrastructure from scratch.

Google just eliminated all of that complexity with the File Search Tool in the Gemini API. It's a fully managed RAG system that abstracts away the entire retrieval pipeline so you can focus on building applications instead of infrastructure.

What Google Announced

The File Search Tool is now available in the Gemini API with three breakthrough features:

1. Complete RAG Automation

The File Search Tool handles everything automatically:

File storage management - Upload your documents and forget about infrastructure
Optimal chunking strategies - Google determines the best way to split your documents
Automatic embeddings - Powered by the Gemini Embedding model
Dynamic context injection - Retrieved content is automatically added to your prompts
Built-in citations - Responses include source references for verification

2. Pay-Per-Indexing Pricing Model

Google's pricing structure focuses charges on the indexing phase:

No storage charges - Document storage has no ongoing cost
No query-time embedding charges - Embeddings generated during searches have no cost
$0.15 per 1M tokens for indexing - You only pay when you first upload and index your files

Cost comparison: Traditional RAG setups require paying for vector database hosting, compute for embeddings, and storage. Google's model reduces these costs significantly, particularly for applications with high query volumes.

3. Powerful Vector Search

The File Search Tool uses Google's state-of-the-art Gemini Embedding model for semantic search. This means:

The system understands meaning and context, not just keyword matching
It finds relevant information even when the exact words aren't used in your documents
Supports a wide range of formats: PDF, DOCX, TXT, JSON, and common programming language files

How It Works in Practice

The File Search Tool integrates directly into the existing generateContent API. You don't need to learn new APIs or change your workflow significantly.

Code Example: Upload and Query

Here's a complete example showing how to create a file search store, upload a document, and query it:

from google import genai
from google.genai import types
import time

client = genai.Client()

# Step 1: Create a File Search store
file_search_store = client.file_search_stores.create(
    config={'display_name': 'my-knowledge-base'}
)

# Step 2: Upload and import a file directly
operation = client.file_search_stores.upload_to_file_search_store(
    file='technical_manual.pdf',
    file_search_store_name=file_search_store.name,
    config={
        'display_name': 'Product Manual',
        'custom_metadata': [
            {'key': 'doc_type', 'string_value': 'manual'},
            {'key': 'version', 'string_value': '2.0'}
        ]
    }
)

# Step 3: Wait for indexing to complete
while not operation.done:
    time.sleep(1)
    operation = client.file_search_stores.get_operation(
        operation_name=operation.name
    )

# Step 4: Query using File Search
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents='How do I reset the device?',
    config={
        'tools': [{
            'file_search': {
                'file_search_store_names': [file_search_store.name]
            }
        }]
    }
)

print("Answer:", response.text)

# Check for citations
if hasattr(response.candidates[0], 'grounding_metadata'):
    print("Citations:", response.candidates[0].grounding_metadata)

Advanced: Metadata Filtering

Filter queries to specific documents using metadata:

# Query only documents tagged as 'manual'
response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents='What are the safety precautions?',
    config={
        'tools': [{
            'file_search': {
                'file_search_store_names': [file_search_store.name],
                'metadata_filter': 'doc_type="manual"'
            }
        }]
    }
)

print("Filtered answer:", response.text)

JavaScript/TypeScript Example

The same functionality in JavaScript:

import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({});

// Create store and upload file
const fileStore = await ai.fileSearchStores.create({
    config: { displayName: 'my-knowledge-base' }
});

const uploadOp = await ai.fileSearchStores.uploadToFileSearchStore({
    fileSearchStoreName: fileStore.name,
    file: 'technical_manual.pdf',
    config: {
        displayName: 'Product Manual'
    }
});

// Query with File Search
const response = await ai.models.generateContent({
    model: "gemini-2.5-flash",
    contents: "How do I reset the device?",
    config: {
        tools: [{
            fileSearch: {
                fileSearchStoreNames: [fileStore.name]
            }
        }]
    }
});

console.log("Answer:", response.text);

Real-World Performance: Beam's Results

Beam, an AI-driven game generation platform developed by Phaser Studio, was part of Google's early access program. Their results are impressive:

Running thousands of searches daily against a growing library of template data
File Search handles parallel queries across all corpora, combining results in under 2 seconds
Previous manual cross-referencing took hours

That's a 3,600x improvement (2 seconds vs hours). This is production-ready performance at scale.

Who Should Use This

Google's early access developers are already building:

Intelligent support bots - Customer service agents that can search through documentation, FAQs, and support tickets instantly
Internal knowledge assistants - Corporate search tools that understand company documents, policies, and procedures
Creative content discovery platforms - Tools that help users find relevant information across large content libraries
Technical documentation systems - Code search, API documentation, and developer tools that understand intent

How Other Providers Handle RAG

Google's fully managed approach is unique. Here's how the major providers compare:

OpenAI: Assistants API with File Search

OpenAI offers a similar managed RAG solution through their Assistants API. Key differences:

Storage costs: OpenAI charges $0.10 per GB per day for vector storage (Google: FREE)
Retrieval costs: OpenAI charges per retrieval operation (Google: FREE)
Integration: Tied to the Assistants API framework vs Google's direct integration with generateContent

Anthropic: Build-Your-Own with Citation Support

Anthropic takes a different approach with Claude:

No managed RAG API: Developers must build and host their own vector database infrastructure
Citation support: Claude API supports search result content blocks for proper source attribution
Projects RAG (web UI only): Claude.ai Projects automatically enable RAG when knowledge exceeds context window, but this is NOT available via API
Philosophy: Anthropic provides tools and documentation for building custom RAG systems rather than a managed service

Anthropic's approach gives developers maximum flexibility but requires managing vector databases (Pinecone, Weaviate, MongoDB Atlas, etc.), embedding models, and retrieval logic.

Key insight: Google is betting that abstracting away RAG infrastructure will accelerate adoption, while Anthropic believes developers want control over their retrieval pipeline. OpenAI sits in the middle with managed infrastructure but higher costs.

Why This Matters

This announcement is significant for three reasons:

1. It Commoditizes RAG Infrastructure

Building and maintaining RAG systems has been a competitive advantage for companies. Google is making it a commodity feature. This accelerates the entire AI application ecosystem by removing a major technical barrier.

2. The Pricing Model Changes RAG Economics

No storage or query-time embedding charges change the cost structure of RAG applications. This pricing model particularly benefits high-query-volume applications, which have traditionally been expensive to operate. Google's approach favors developer adoption by reducing operational costs.

3. Competitive Dynamics Are Shifting

Google's aggressive pricing undercuts OpenAI's Assistants API (which charges for storage and retrieval). OpenAI will need to respond or justify the cost premium. Anthropic's build-your-own philosophy faces pressure as managed solutions become cheaper and easier. For developers, this competition means better tools and lower costs regardless of which provider they choose.

Technical Considerations

While the File Search Tool is powerful, developers should understand its design decisions:

Automatic Chunking

Google handles chunking strategies automatically. This is convenient but means you can't fine-tune chunking for domain-specific requirements. For most applications, Google's chunking will be excellent. For specialized use cases requiring custom chunking logic, you may still need a self-managed solution.

Embedding Model Lock-in

File Search uses the Gemini Embedding model. You can't swap in different embedding models or experiment with fine-tuned embeddings. This trade-off makes sense for a managed service, but it's worth noting if you need embedding customization.

Built-in Citations

The automatic citation system is valuable for verification and trust. Responses include references to specific parts of your documents. This is essential for applications where factual accuracy and source attribution matter.

How to Get Started

The File Search Tool is available now in the Gemini API. Google provides:

Documentation: Complete technical documentation at ai.google.dev/gemini-api/docs/file-search
Demo app: "Ask the Manual" demo in Google AI Studio (requires a paid API key) at aistudio.google.com/apps/bundled/ask_the_manual
Remix capability: You can clone and customize the demo app to build your own applications

The Bottom Line

Google's File Search Tool makes RAG accessible to every developer by eliminating infrastructure complexity and introducing developer-friendly pricing. Free storage and query-time embeddings combined with $0.15 per 1M tokens for indexing create an economic model that favors experimentation and scale.

The 2-second response times demonstrated by Beam show this is production-ready technology. For developers building support bots, knowledge assistants, or any application that needs to search through documents intelligently, this is the easiest path to RAG.

The competitive implication: Google is undercutting OpenAI's pricing while offering comparable managed RAG capabilities. OpenAI charges for storage and retrieval; Google makes both free. Anthropic's approach of letting developers build custom RAG systems offers flexibility but faces pressure as managed solutions become cheaper and easier. For developers, this competition delivers better infrastructure at lower costs across the entire AI ecosystem.

Key Takeaways

Google launched a fully managed RAG system in the Gemini API that handles storage, chunking, embeddings, and retrieval automatically
Storage and query-time embeddings have no cost. You only pay $0.15 per 1M tokens for initial indexing
Built-in citations provide automatic source attribution for verification
Beam demonstrated 2-second response times across thousands of daily searches, replacing hours of manual work
Google undercuts OpenAI's Assistants API pricing (which charges for storage and retrieval). Anthropic requires developers to build custom RAG infrastructure
Code examples in Python and JavaScript make implementation straightforward with minimal boilerplate
This commoditizes RAG infrastructure, accelerating AI application development across the ecosystem

ResearchAudio.io - AI Research Newsletter

All information verified against official sources. Zero speculation, 100% factual.

You're receiving this because you subscribed to ResearchAudio.io

The Simplest Way to Create and Launch AI Agents and Apps

You know that AI can help you automate your work, but you just don't know how to get started.

With Lindy, you can build AI agents and apps in minutes simply by describing what you want in plain English.

→ "Create a booking platform for my business."
→ "Automate my sales outreach."
→ "Create a weekly summary about each employee's performance and send it as an email."

From inbound lead qualification to AI-powered customer support and full-blown apps, Lindy has hundreds of agents that are ready to work for you 24/7/365.

Stop doing repetitive tasks manually. Let Lindy automate workflows, save time, and grow your business

Get $20 Worth of Free Credits