In partnership with

AWS Goes All-In on Autonomous Agents and Custom Model Training

Deep Dive

AWS Goes All-In on Autonomous Agents and Custom Model Training at re:Invent 2025

Amazon unveiled Nova 2 models, Frontier Agents that work for days without intervention, Nova Forge for building custom models, and Trainium3 chips now in production.

9 min read

AWS re:Invent 2025 made one thing clear: the era of AI chatbots is over. Amazon is betting that autonomous agents—AI systems that work independently for hours or days—are the next inflection point. CEO Matt Garman put it directly: agents are "turning AI from a technical wonder into something that delivers real value."

The announcements spanned models, agents, infrastructure, and enterprise tooling. Here's what matters for practitioners.

Nova 2: Four New Models with Adjustable Reasoning

Amazon released its Nova 2 model family—four models designed to compete directly with OpenAI and Anthropic on reasoning and multimodal capabilities.

Nova 2 Lite is the cost-optimized reasoning model for everyday workloads. It processes text, images, and video to generate text. The key feature: adjustable "thinking" levels (low, medium, high) that let you balance intelligence depth with speed and cost. Amazon claims it equals or exceeds Claude Haiku 4.5 on 13 of 15 benchmarks.

Nova 2 Pro is Amazon's most capable reasoning model. Built for complex tasks like agentic coding, long-range planning, and advanced problem-solving. It processes text, images, video, and speech. Amazon says it matches or beats GPT-5.1 on 8 of 16 benchmarks.

Nova 2 Sonic is a speech-to-speech model for real-time voice conversations. It offers a one-million token context window for sustained interactions, multilingual support, and the ability to switch between voice and text mid-conversation. Integrates with Amazon Connect and telephony providers like AudioCodes.

Nova 2 Omni (in preview) is the first unified multimodal reasoning and generation model. It processes text, images, video, and speech while generating both text and images. It can handle 750,000 words, hours of audio, long videos, and hundred-page documents in a single context.

Both Lite and Pro include built-in web browsing and code execution capabilities.

What this means: The adjustable reasoning feature in Nova 2 Lite is genuinely useful for production systems. Being able to dial reasoning depth up or down based on query complexity could significantly reduce inference costs while maintaining quality where it matters. Organizations like Cisco, Siemens, and Trellix are already using Nova 2 for applications ranging from threat detection to video understanding.

Nova Forge: Build Your Own Frontier Model for $100K/Year

Nova Forge is potentially the most significant announcement for enterprises. It's an "open training" service that lets organizations build custom variants of Nova—called "Novellas"—by blending proprietary data with Amazon's curated training data.

The key innovation: access to pre-trained, mid-trained, and post-trained model checkpoints. This lets you inject your data at any stage of training rather than just fine-tuning at the end.

AWS CEO Matt Garman explained the reasoning: "What I hear over and over is 'What I would really love is a frontier model that actually just understands my data.'" Traditional fine-tuning only scratches the surface. Training open-weight models risks catastrophic forgetting. Building from scratch costs hundreds of millions.

Nova Forge also includes reinforcement learning "gyms"—synthetic environments where models learn from simulated scenarios that reflect your real-world use cases.

Early customers include Booking.com, Reddit, Sony, Cosine AI, and Nomura Research Institute. Reddit is using Nova Forge to replace multiple specialized models with a single custom solution.

What this means: At $100,000/year, Nova Forge dramatically undercuts the cost of building frontier models from scratch. The catch: your custom model lives on Amazon Bedrock. You don't get the weights. For organizations already committed to AWS and needing deep domain customization, this could be transformative. For those wanting model portability, it's a significant lock-in consideration.

Frontier Agents: AI That Works for Days Without Intervention

AWS introduced a new class of AI called "Frontier Agents"—autonomous systems that can work for hours or days without constant human oversight. Three agents launched in preview:

Kiro autonomous agent is a virtual developer that learns how your team works by scanning existing code, reviewing pull requests, and observing interactions. It connects to repos, pipelines, Jira, GitHub, and Slack to maintain context across projects. AWS claims it can work independently for days, handling multiple tasks simultaneously.

AWS Security Agent acts as a virtual security engineer. It performs threat modeling, secure design reviews, code scanning (SAST/DAST), and penetration testing. Ask "What are the main threats in this feature design?" and it generates attack trees and mitigations. Feed it a repo and it surfaces insecure patterns with explanations and patches.

AWS DevOps Agent functions as a virtual operations engineer. It monitors systems, triages incidents, identifies root causes, and suggests or applies fixes. It analyzes data across CloudWatch, GitHub, ServiceNow, and other tools to coordinate incident response.

The agents are built on Amazon Bedrock AgentCore, which provides the control plane for enterprise agents: memory, policies, evaluations, and observability. New AgentCore features include Policy (natural language boundaries for agent actions) and 13 prebuilt evaluation systems for monitoring agent performance.

What this means: AWS isn't hiding the risks—Garman acknowledged that efficiency gains from Kiro were "more incremental than transformative" for the first few weeks as teams adjusted. The agents pull requests for human review and don't merge without oversight. Still, an agent working for days without intervention raises legitimate concerns about error propagation. These are preview releases. Evaluate carefully before production deployment.

Nova Act: Browser Automation at 90% Reliability

Nova Act is a service for building AI agents that automate browser-based tasks. Unlike scripted automation, these agents understand UI context and can handle dynamic interfaces.

Use cases include form filling, search and extraction, shopping and booking flows, and QA testing. Powered by a specialized Nova 2 Lite variant trained with reinforcement learning across thousands of simulated web tasks.

The headline number: 90% reliability for browser-based workflows built by early customers. Developers can prototype agents using natural language in a no-code playground, refine them in VS Code, and deploy through AWS.

Hertz used Nova Act to accelerate development velocity by 5x. Amazon's Project Kuiper satellite team reduced test case creation from weeks to minutes.

Trainium3 Now in Production with 4x Performance Gains

AWS's custom AI chip story continued with Trainium3 UltraServers now generally available. Built on a 3-nanometer process, each chip delivers 2.52 petaflops of FP8 compute with 144 GB of HBM3e memory and 4.9 TB/s bandwidth.

A Trn3 UltraServer connects 144 chips, delivering 362 FP8 petaflops aggregate compute, 20.7 TB of memory, and 706 TB/s bandwidth. AWS claims 4.4x more performance and 4x better energy efficiency than Trainium2 UltraServers.

In production benchmarks: 3x faster performance than Trainium2 on Amazon Bedrock, 5x higher output tokens per megawatt, and 3x better power efficiency than any other accelerator on the service.

Customers including Anthropic, Karakuri, and SplashMusic are reporting training and inference cost reductions up to 50% with Trainium technology.

AWS also announced Trainium4 is in development with 6x FP4 throughput, 3x FP8 performance, and 4x more memory bandwidth. Notably, Trainium4 will integrate Nvidia's NVLink Fusion interconnect, enabling mixed GPU and Trainium deployments in the same racks.

Graviton5: The Quiet Infrastructure Win

AWS introduced Graviton5 processors—192 cores per chip with 5x larger cache than the previous generation. New EC2 M9g instances deliver 25% higher performance than Graviton4-based instances.

For teams running general compute workloads alongside AI inference, Graviton5 provides the density improvements needed to keep non-AI infrastructure costs in check.

AWS AI Factories: Bringing AWS AI to Your Data Center

For organizations with data sovereignty requirements, AWS AI Factories allows deployment of AWS AI infrastructure in customer data centers. The solution combines Nvidia GPUs, Trainium chips, AWS networking, and services like Bedrock and SageMaker AI.

HUMAIN in Saudi Arabia is building an "AI Zone" featuring up to 150,000 AI chips in a purpose-built facility using this approach.

AWS Transform: Modernizing Legacy Code with AI

AWS Transform received significant upgrades for code modernization. The service now uses AI to learn your organization's patterns and automate transformations across repositories—cutting execution time by up to 80%.

New capabilities include full-stack Windows modernization (app code, UI frameworks, databases, deployment configs) and mainframe "Reimagine" features that transform legacy applications into cloud-native architectures.

AWS demonstrated this by literally dropping a decommissioned server rack 120 feet at re:Invent—a theatrical way to make the point that organizations spend 30% of engineering time on tech debt.

Other Notable Announcements

Aurora DSQL is now generally available—the distributed SQL database with active-active multi-region support that AWS previewed last year.

Database Savings Plans offer up to 35% cost reduction for committed database usage. After six years of customer complaints, AWS finally delivered predictable database pricing.

Lambda Durable Functions enable multi-step applications that coordinate over extended periods—from seconds to up to one year—without paying for idle compute.

New models in Bedrock include Mistral Large 3, Ministral 3, Google Gemma 3, MiniMax M2, and Nvidia Nemotron.

AWS + Google multicloud networking was announced—a notable collaboration given the competitive dynamics between cloud providers.

The Bigger Picture

AWS's thesis is clear: AI assistants are giving way to AI agents. The shift from tools that answer questions to systems that complete multi-step work autonomously represents a fundamental change in how software gets built.

But a note of caution: analysts at Forrester observed that while AWS is "thinking ahead," most enterprises are still piloting AI projects and aren't at the maturity level these tools assume. An MIT study found 95% of enterprises aren't seeing ROI from AI yet.

Werner Vogels, in what was announced as his final keynote, framed this tension well. When asked if AI will take developers' jobs, he said: "Maybe. Will AI make me obsolete? Absolutely not... if you evolve."

The practical implications for practitioners:

  • Test Nova 2 Lite's adjustable reasoning for production inference cost optimization.
  • If you need deep domain customization and are committed to AWS, evaluate Nova Forge against your current fine-tuning approach.
  • Frontier Agents are preview releases. Watch how early adopters handle the oversight challenges before committing.
  • Trainium3 is now production-ready. The 50% cost reduction claims from customers like Anthropic warrant evaluation for inference workloads.
  • Database Savings Plans are worth calculating if you have predictable database workloads.

The tools are maturing faster than most organizations' ability to adopt them. That gap creates both opportunity and risk.

Choose your experiments wisely.


Resources

Thanks for reading. If you found this useful, share it with someone building on AWS.

Attention spans are shrinking. Get proven tips on how to adapt:

Mobile attention is collapsing.

In 2018, mobile ads held attention for 3.4 seconds on average.
Today, it’s just 2.2 seconds.

That’s a 35% drop in only 7 years. And a massive challenge for marketers.

The State of Advertising 2025 shows what’s happening and how to adapt.

Get science-backed insights from a year of neuroscience research and top industry trends from 300+ marketing leaders. For free.

Keep Reading

No posts found