The Agent Infrastructure You Were Building by Hand

Anthropic just shipped the managed version of what you've been duct-taping together with Claude Code, CLAUDE.md files, and hope.

If you’ve tried to put an AI agent into a real product — not a chatbot, an actual agent that reads files, runs code, browses the web, and iterates on its own work — you know how much plumbing it takes. The agent loop, the sandboxing, the error recovery, the state management, the observability. You end up spending more time on infrastructure than on the agent itself.

On April 8, Anthropic launched Claude Managed Agents — a fully managed cloud environment that handles all of that infrastructure so you can focus on what the agent actually does.

This isn’t about Claude Code or improving your personal coding workflow. This is about embedding AI agents into software products — SaaS platforms, internal tools, customer-facing applications — at scale.

The TL;DR

Software products used to be fixed code. You write the logic, deploy it, users interact with it. The code does exactly what you programmed, nothing more.

That’s changing. The new model: the AI is literally inside the product, doing the work.

Notion doesn’t just store your docs — an agent inside Notion generates presentations and ships code for you. Asana doesn’t just track your tasks — an AI Teammate picks up the task and drafts the deliverable. Sentry doesn’t just report the bug — an agent writes the patch and opens the PR.

The product isn’t the interface anymore. The product is the agent that does the work through the interface.

The problem was: building that agent infrastructure was brutal. Containers, sandboxing, error recovery, tool orchestration, context management — months of plumbing before you could ship a single agent feature.

Claude Managed Agents makes this dramatically easier. Anthropic handles the entire runtime. You define what the agent should do, point it at your tools and data, and it runs in their cloud. What used to take months of infrastructure work now takes days.

That’s the shift. Not “AI-assisted software.” Agent-powered software. And the barrier to building it just dropped by an order of magnitude.

What Problem This Solves

Building a production agent today means building everything yourself: the execution container, the tool orchestration layer, the prompt caching, the context management when conversations get long, the error recovery when things fail, the observability to know what your agent is actually doing.

Managed Agents eliminates that entire stack. You define what the agent should do. Anthropic handles how it runs.

How It Works

Four concepts make up the architecture:

Agent — A reusable, versioned config: which model to use, the system prompt, which tools are available, MCP server connections, and skills. You create it once and reference it by ID across sessions.
Environment — The container template: pre-installed packages, networking rules. Each session gets its own isolated Ubuntu 22.04 instance with up to 8GB RAM and 10GB disk.
Session — A running agent instance. It has its own filesystem, conversation history, and state. This is where the actual work happens.
Events — SSE-based bidirectional communication. You send messages in, the agent streams results back. You can interrupt, steer, or provide tool results mid-execution.

Create an agent. Create an environment. Start a session. Send events. That’s it.

The Toolset Is Identical to Claude Code

If you’ve used Claude Code, the built-in tools will look familiar:

bash — execute shell commands
read, write, edit — file operations
glob, grep — search
web_fetch, web_search — internet access

Same names, same patterns. The difference is that your agent runs in an isolated container on Anthropic’s cloud, not on your laptop. Their infrastructure, their sandboxing, their error recovery.

Building It Yourself vs Managed Agents

To make the difference concrete — here’s what you handle yourself today versus what Managed Agents takes off your plate:

Container/sandbox — Today: you spin up Docker, manage images, handle resource limits, clean up orphaned containers. Managed Agents: each session gets an isolated Ubuntu 22.04 container automatically. You never touch Docker.
Agent loop — Today: you write the while-loop that sends a prompt, parses tool calls, executes them, feeds results back, handles errors, and decides when to stop. Managed Agents: the harness runs the loop. You send events in and stream results out.
Tool execution — Today: you build the bash executor, the file I/O layer, the web fetcher, permission checks, timeout handling. Managed Agents: eight built-in tools with configurable permissions, ready to go.
Context management — Today: you track token counts, decide when to summarize, build your own compaction logic to keep conversations under the limit. Managed Agents: automatic prompt caching and context compaction, built into the harness.
Error recovery — Today: you handle API failures, tool crashes, rate limits, container OOM kills, and network timeouts with your own retry logic. Managed Agents: automatic rescheduling on transient errors. Sessions resume where they left off.
Observability — Today: you instrument your own logging, token tracking, cost monitoring. Managed Agents: span events with model request timing and token counts streamed to you in real-time.
Multi-agent coordination — Today: you build the orchestration layer, message passing, shared state management, thread isolation. Managed Agents: built-in coordinator pattern with isolated threads and condensed views.
Output evaluation — Today: you eyeball the results or write custom test harnesses. Managed Agents: Outcomes feature provisions a separate grader model that scores against your rubric and drives iteration automatically.
Persistent state — Today: you build a database layer for agent memory, handle versioning, deal with concurrent writes. Managed Agents: workspace-scoped memory stores with versioning, optimistic concurrency, and audit trails.

The punchline: everything above the line is infrastructure you maintain but adds zero value to what your agent actually does. Managed Agents is Anthropic saying “we’ll handle the plumbing, you handle the product.”

Three Features That Change the Game

The infrastructure elimination is nice. But three capabilities in research preview are what make this genuinely new:

Outcomes

You define what “done” looks like with a rubric. The system provisions a separate grader — a different model context — that evaluates the agent’s output against your criteria, identifies specific gaps, and feeds them back to the agent for iteration. Up to 20 rounds.

This isn’t “check your work.” It’s a dedicated evaluation model running in parallel, scoring against explicit criteria. Anthropic reports it improves task success by up to 10 points over standard prompting loops.

Multi-Agent Orchestration

One agent can coordinate multiple sub-agents within a single session. They share the filesystem but each runs in its own thread with isolated context. The coordinator sees a condensed view; you can drill into individual threads.

Think: a code review agent, a test generation agent, and a documentation agent — all orchestrated by a single coordinator, sharing the same codebase but maintaining separate conversation histories.

Persistent Memory

Memory stores that survive across sessions. The agent checks them before starting and writes durable learnings when done. Workspace-scoped, versioned, with audit trails. If you’ve used Claude Code’s .claude/memory/ directory, this is the production-grade API version of the same concept.

What This Looks Like in Practice

The early adopters make the use case concrete:

Notion embedded Managed Agents directly into workspaces. Engineers ship code and knowledge workers generate presentations and websites without leaving the platform — dozens of parallel agent tasks running while teams collaborate on the outputs simultaneously.

Asana built what they call AI Teammates. Agents sit inside project management workflows, pick up assigned tasks, draft deliverables, and hand them back for human review. The agent is a team member with a task list.

Rakuten deployed agents across five business functions, each plugged into Slack and Teams. Each function went live in under a week. Task turnaround dropped from 24 days to 5 — a 79% reduction.

Sentry paired their debugging agent with a Managed Agent that writes patches and opens PRs automatically. The integration shipped in weeks instead of the months it would have taken to build the infrastructure from scratch.

The pattern: companies aren’t using this to build chatbots. They’re embedding agents into existing products as autonomous workers that take assignments, use tools, and deliver results.

MCP Is First-Class

MCP servers are a core part of the architecture. Agent creation declares servers by name and URL. Session creation supplies auth via vault IDs, keeping secrets out of reusable agent definitions.

One limitation: only remote HTTP MCP servers are supported — no local stdio servers. Your agent talks to MCP endpoints over the network, not local processes.

The Container

Each session gets its own isolated container with a comprehensive runtime stack:

Python 3.12+, Node.js 20+, Go 1.22+, Rust 1.77+, Java 21+, Ruby 3.3+, PHP 8.3+, C/C++ GCC 13+
All major package managers (pip, uv, npm, yarn, cargo, gem, composer, maven, gradle)
SQLite, PostgreSQL client, Redis client
Git, curl, ripgrep, docker (limited), tmux, vim
Network: disabled by default (configurable to unrestricted or allow-list)

8GB RAM. 10GB disk. No GUI. A headless coding environment, not a general-purpose VM.

Pricing

$0.08 per session hour plus standard Claude token costs.

That’s 8 cents an hour for the container and orchestration infrastructure. A 10-minute session costs less than a cent. The real cost is tokens — same as any Claude API usage. Compare that to running your own container infrastructure with equivalent specs — orders of magnitude more in engineering time alone.

Who This Is For

Use Managed Agents if you’re embedding agent capabilities into a product (SaaS, internal tools, customer-facing apps), need agents running asynchronously for hours, want multi-agent orchestration without building coordination, need rubric-graded output evaluation, or are deploying agents at scale across multiple users.

This is not a replacement for Claude Code. If you’re a developer working interactively on your own codebase, using IDE integrations, or building for yourself — Claude Code is still the right tool. Managed Agents is for when you’re building agent capabilities into something else.

What This Means

There’s a take going around that this kills 1,000+ startups. That’s only half right.

What it kills is the scaffolding — the boilerplate agent loops, the container management, the tool execution plumbing. If you were building a thin wrapper around Claude with nothing but prompt engineering, that’s now a commodity.

But if you’re solving domain-specific problems — proprietary workflows, specialized data pipelines, industry-specific compliance — you just got a better foundation. Managed Agents is infrastructure. The product is what you build on top.

The signal worth paying attention to: Anthropic isn’t just building a better chatbot. They’re building the platform for agent-powered software.