In partnership with

This Pre-IPO Stock Is Up 4,000% Already

How do you follow 4,000% valuation growth? By preparing for what’s next. That’s what pre-IPO company Immersed did, reserving the Nasdaq ticker $IMRS.

But the real opportunity for investors is now, before public markets.

Why? Immersed changed the game in extended reality (XR), developing the Meta Quest store’s most popular productivity app. They have more than 1.5M users, including Fortune 500 teams, many who already use it up to 60 hours a week.

But that’s not all. Immersed’s soon-to-be-released XR headset has 2M more pixels than Apple’s Vision Pro for 70% less cost and weight. No wonder they’re projecting $71M in first-year sales.

Immersed is redefining the $250B+ future of work. That’s why 6,000+ investors have already secured pre-IPO shares in Immersed’s growth.

They have partnerships in place with Qualcomm and Samsung. Executives and founders from Palantir, Facebook, Reddit, and Sailpoint invested. You can, too. But there’s no time to waste. Invest in Immersed before the opportunity closes.

This is a paid advertisement for Immersed Regulation A+ offering. Please read the offering circular at https://invest.immersed.com/

363 Tokens Per Second at 25 Cents per Million

ResearchAudio.io  |  March 4, 2026

363 Tokens Per Second at 25 Cents per Million

Gemini 3.1 Flash-Lite and GPT-5.3 Instant both shipped March 3. Two very different bets on what AI needs next.

On the same Tuesday morning, OpenAI and Google each published a blog post about a new model. Neither was a frontier release. Neither claimed a new benchmark state of the art. Both were, in their own way, more interesting than a headline number would suggest.

OpenAI shipped GPT-5.3 Instant, an update to the fast model that powers most everyday ChatGPT conversations. Google shipped Gemini 3.1 Flash-Lite, a developer-facing model priced at $0.25 per million input tokens. The two releases reflect two different hypotheses about where the real problem is: one company says the bottleneck is behavioral quality; the other says it is inference cost.

GPT-5.3 Instant: When Benchmarks Miss the Point

OpenAI's announcement for GPT-5.3 Instant is notable for what it does not lead with. There are no new benchmark scores on MMMU or GPQA. The release blog opens with a complaint: GPT-5.2 Instant would sometimes respond to ordinary questions with unsolicited phrases like "Stop. Take a breath." Users noticed, subscriptions lapsed, and GPT-5.3 Instant is the direct response.

The core change is a recalibration of tone, refusal behavior, and web search synthesis. According to OpenAI's internal evaluations, the update reduces hallucination rates by 26.8% on high-stakes domains (law, medicine, finance) when web search is active, and by 19.7% when the model relies only on internal knowledge. On a separate user-feedback evaluation drawn from flagged ChatGPT conversations, hallucinations dropped 22.5% with web search and 9.6% without.

26.8%
Hallucination reduction
(web, high-stakes eval)
19.7%
Hallucination reduction
(no web, high-stakes)
22.5%
Hallucination reduction
(web, user-feedback eval)

The refusal behavior change is also meaningful. OpenAI describes GPT-5.3 Instant as more willing to answer questions it can safely address, with fewer "moralizing preambles." The company illustrated this with an archery trajectory prompt: GPT-5.2 Instant opened with a lengthy disclaimer about weapons; GPT-5.3 Instant answered the physics directly. Across all users, the model is available via API as gpt-5.3-chat-latest.

The web search behavior also changed. Prior versions would frequently return long link lists or loosely connected facts. GPT-5.3 Instant is designed to blend retrieved results with its own knowledge, surface the most relevant information upfront, and recognize the subtext behind questions rather than literally interpreting the query. One acknowledged limitation: Japanese and Korean responses still sound stilted in some contexts. Multilingual parity is listed as ongoing development work.

Key Insight: GPT-5.3 Instant treats alignment as a product problem, not a safety problem. The hallucination reductions and refusal changes are both downstream of the same diagnosis: users were abandoning ChatGPT not because it lacked capability but because its behavior was off-putting. The model is designed to fix that without touching frontier reasoning.

Gemini 3.1 Flash-Lite: The Cost-per-Token Argument

Google's release is a different kind of announcement. Gemini 3.1 Flash-Lite is the first Flash-Lite model in the Gemini 3 series, available in preview to developers via Google AI Studio and to enterprises via Vertex AI. It is not yet available in the consumer Gemini app. The model's central argument is cost and speed: $0.25 per million input tokens and $1.50 per million output tokens, with a generation speed of up to 363 tokens per second.

Compared to Gemini 2.5 Flash, the prior cost-efficient reference point, Flash-Lite delivers a 2.5x faster time to first token and a 45% increase in output speed according to the Artificial Analysis benchmark. Context window is 1 million tokens, with outputs up to 64,000 tokens. The architecture inherits from Gemini 3 Pro, which uses a mixture-of-experts design that activates only a subset of parameters per query, directly reducing inference cost.

Gemini 3.1 Flash-Lite  |  Architecture + Benchmarks
ARCHITECTURE
Gemini 3 Pro base
Mixture-of-Experts
Partial param activation
CONTEXT
1M input tokens
64K output tokens
Multimodal input
PRICING
$0.25 / 1M input
$1.50 / 1M output
Vertex AI + AI Studio
Benchmark Results
GPQA Diamond
86.9%
MMMU Pro
76.8%
SPEED VS. 2.5 FLASH
↗ 2.5x faster time to first token
↗ 45% faster output speed
↗ Up to 363 tok/s peak
Source: Google blog, March 3, 2026 & Artificial Analysis benchmark

On the Arena.ai leaderboard, Flash-Lite scores 1432 Elo points. In Google's own 11-benchmark evaluation, it outperformed GPT-5 mini and Claude 4.5 Haiku across 6 of those tests. Google positions it for tasks like translation, content moderation, UI generation, and simulation, where request volume is high and per-query reasoning depth is not critical.

One design detail worth noting: Flash-Lite includes adjustable "thinking levels," allowing developers to tune how much reasoning the model applies before generating output. For bulk processing jobs, this option lets teams minimize token production and therefore cost. Google did not publish agent benchmarks for Flash-Lite, explicitly stating the model is designed for data processing tasks rather than multi-step agent coordination.

Key Insight: Flash-Lite is not competing with Gemini 3.1 Pro on reasoning. It is competing with the economics of deploying any model at scale. At $0.25 per million input tokens, a system processing 100 million tokens per day spends $25 on input. That changes the math on what AI-augmented pipelines can afford to run continuously.

Two Hypotheses, One Tuesday

Placing these two releases side by side makes the contrast cleaner. OpenAI's GPT-5.3 Instant is a behavioral update to a model that already handles most ChatGPT traffic. Its improvements, reduced hallucinations and fewer preachy refusals, address friction in human-to-AI conversation. Google's Gemini 3.1 Flash-Lite is an infrastructure update. Its improvements, cost and speed, address friction in developer-to-API pipelines.

GPT-5.3 Instant
Gemini 3.1 Flash-Lite
Target: ChatGPT consumer users Target: High-volume developer pipelines
Key metric: 26.8% fewer hallucinations (web) Key metric: 363 tokens/sec at $0.25/1M input
Available: All ChatGPT users, API gpt-5.3-chat-latest Available: Developer preview (AI Studio, Vertex AI)
Architecture change: None, behavioral training only Architecture: MoE from Gemini 3 Pro base

The industry context matters here. ChatGPT uninstalls reportedly spiked 295% in late February after OpenAI's Pentagon contract drew criticism, and Claude reached the top free app position in the US iOS App Store on March 2, the day before both these releases. GPT-5.3 Instant arrives in that environment. It is, at least in part, a product response.

Key Insight: Both releases point at the same underlying trend: the capability gap between frontier models is narrowing fast enough that the competitive surface is shifting elsewhere. For OpenAI, the differentiator is conversational behavior. For Google, it is inference economics. Neither claim is about raw intelligence anymore.

OpenAI has already teased "5.4 sooner than you think." At this iteration speed, the more durable question is not which model wins a benchmark this quarter. It is which company builds the deployment layer that makes model updates transparent to end users, so that improvements in hallucination rates and token throughput compound without requiring anyone to change their code.

ResearchAudio.io  |  AI Research, Explained

Sources: OpenAI GPT-5.3 Instant  |  Google Gemini 3.1 Flash-Lite

Published March 4, 2026  |  ResearchAudio.io

Keep Reading