Grok Build, or the Day the CLI Became a Standard

xAI shipped Grok Build this week — same AGENTS.md, same MCP, same Skills, same plugins, same hooks. The interface just became a standard. The war moves to the model.

Grok Build, or the Day the CLI Became a Standard — AI

xAI shipped Grok Build yesterday. It is a coding agent and a CLI. It is also, structurally, the same product as Claude Code. Same conventions file. Same plugin format. Same Skills. Same MCP servers. Same hook scripts. Same plan-then-execute shape. Different model underneath, different price tag on top, different logo in the corner of the terminal. That convergence is the story.

I have 1,282 logged hours in Claude Code. It is still my daily driver. The question on the desk this week is not whether to switch — the answer is no — but what it means that a serious competitor can clone the surface this completely, this fast, with xAI's own docs explicitly stating "fully compatible with Claude Code with zero configuration."

📋
In six bullets

Grok Build is xAI's Claude Code answer. Beta opened May 14, 2026 for SuperGrok Heavy ($299/mo, $99 intro for six months). Terminal-native CLI. Grok 4.3 beta with a 16-agent Heavy architecture and a 2 million token context window.

It speaks the open-standard agent dialect verbatim. AGENTS.md (OpenAI's convention, now Linux Foundation), MCP servers (Anthropic), Skills (Anthropic, open-standardized in December), plugins, hooks. Drop the CLI into a repo wired for Claude Code and it works.

Plan Mode is gated execution by default. Approve the plan, comment on individual steps, or rewrite it before any code moves. Then clean diffs.

Eight parallel subagents from day one. Plan, search, build. Worktree integrations. Headless mode for CI. ACP support for orchestration.

The differentiator is the model, not the CLI. Grok 4.3 Heavy is the bet. The surface stopped being a moat the moment the standards landed under the Linux Foundation.

What this means for operators. The interface war is over. The model war is on. Hedging is now a one-line install, not a re-platforming.

What Grok Build actually is

xAI calls Grok Build a powerful new coding agent and CLI for professional software engineering and complex coding work. The phrasing is almost a copy of how Anthropic positioned Claude Code at launch. That is not an accident.

Under the hood, Grok 4.3 beta is the underlying model. Launch coverage describes a Heavy-style multi-agent path with up to 2 million tokens of context. xAI's public API page currently lists grok-4.3 at 1 million tokens, so take the larger figure as marketing-forward until the docs converge. The CLI itself orchestrates up to eight parallel subagents in a three-stage plan, search, build cycle. The thing you actually type is a terminal program. The thing it actually does is delegate to a small fleet of agents under one window.

Install is one line: curl -fsSL https://x.ai/cli/install.sh | bash. Same shape as every modern agent CLI launch this year.

Pricing is the cost-of-entry question. The early beta is gated to SuperGrok Heavy at the standard $299 per month, with an introductory deal at $99 per month for the first six months. That sits above Claude's Max tiers and well above Claude Pro. xAI is using the CLI to pull people up the subscription ladder, not the other way around.

Below is the feature surface as it ships in beta.

CapabilityGrok Build (beta, May 14)Claude Code (Anthropic, current)
ModelGrok 4.3 beta, 16-agent HeavyClaude Opus 4.7 / Sonnet 4.6
Context window2,000,000 tokens200,000 tokens (Sonnet 4.6, with long-context extension)
Parallel subagentsUp to 8 (plan / search / build)Subagents via Task tool, no fixed cap
Plan ModeGated execution by default; approve, comment, or rewrite the planExitPlanMode flow with manual approval
Conventions fileAGENTS.md (Linux Foundation standard)CLAUDE.md + AGENTS.md compatible
Tool protocolMCP servers (Anthropic-authored standard)MCP servers (native)
Reusable proceduresSkills (Anthropic open standard, Dec 2025)Skills (native)
Lifecycle hooksHooks compatibleHooks (PreToolUse, PostToolUse, Stop, etc.)
PluginsPlugins compatiblePlugins (marketplaces, install via /plugin)
Headless mode-p flag for scripts and CI-p flag for scripts and CI
Agent Coordination ProtocolBuilt-in ACP supportNo native ACP yet; agent comms via MCP / subagents
Worktree integrationNativeNative via /worktree
Pricing tierSuperGrok Heavy ($299/mo, $99 intro)Anthropic API or Pro ($20+) / Max ($100+) plans

Read down that column. Every row that is not the model itself is a row where Grok Build ships parity. xAI did not invent a new convention. It adopted the existing ones and differentiated where it could: parameters, context length, parallelism, and price.

The interface stopped being a moat in December

Two things happened at the end of 2025 that determined the shape of every CLI launched this year.

First, in December 2025, Anthropic, OpenAI and Block formed the Agentic AI Foundation under the Linux Foundation. Anthropic donated MCP. Block donated Goose. OpenAI donated AGENTS.md. Skills became an open standard shortly after. By April 2026 the Foundation had over 170 member organizations including Google, Microsoft, AWS, Cloudflare and Bloomberg. The agent interface stopped being proprietary surface area and started being something closer to HTTP.

Second, AGENTS.md, which OpenAI quietly released in August 2025 as a README-for-agents format, hit 60,000 open-source repositories within nine months. Codex reads it. Cursor reads it. Devin reads it. Gemini CLI reads it. GitHub Copilot reads it. Jules reads it. VS Code reads it. Claude Code reads it. Now Grok Build reads it.

This is why Grok Build looks like Claude Code. It is not because xAI cloned Anthropic. It is because the conventions that Claude Code popularised stopped belonging to Anthropic somewhere around the time of the donation. AGENTS.md and MCP are AAIF projects under the Linux Foundation. Skills are now an open standard too. Whoever ships a new CLI in 2026 either implements that stack or builds a worse experience on purpose.

xAI made the obvious choice. They implemented the stack. This is not theft. This is what happens when a good convention escapes the original product.

The political read here is the interesting one: the company that probably benefits most from open standards is xAI, because their model lead is the part of the bet they want to defend, and the interface battle is one they would have lost trying to fight alone. Standardising the surface lets them ship a credible product on day one without asking developers to learn a new conventions file or rewire their hooks.

What is actually different

Three things are not the same as Claude Code.

Context window

Two million tokens is the number being quoted around launch coverage. xAI's own current API model page lists grok-4.3 at one million tokens. Either coverage is talking about a beta path the API page has not caught up to, or the 2M number is marketing-forward and 1M is what actually ships at the API layer today. I would not build procurement policy on the larger number until xAI's docs are unambiguous.

Either number is large. Anthropic ships up to 1 million tokens on Claude Opus 4.7 and Sonnet 4.6 in Max, Team, and Enterprise tiers, plus a sliding compaction approach that effectively gives Claude Code far more in practice through compact_20260112 and friends. The honest framing is that raw window and managed compaction are two different bets on the same problem. Raw window is simpler. Compaction is more economical. For most coding tasks the context window is not the bottleneck; retrieval into the window is.

Sixteen-agent Heavy architecture

Grok 4.3 Heavy runs 16 internal agents that vote and converge on responses. Most users will never see this; it shows up as better answers on hard reasoning tasks at a higher latency than the single-agent variant. Grok Build then layers eight parallel subagents on top of that, each of which can be a Heavy instance. The compute footprint is substantial. Whether it shows up as measurable better output for software engineering tasks is what 90 days of dogfood will determine.

Plan Mode as the default execution gate

Claude Code has an ExitPlanMode flow, and it is good. Grok Build foregrounds Plan Mode as a first-class launch feature: approve the plan, comment on individual steps, or rewrite it entirely before code moves. Then diffs appear. The product bet is that users skip planning unless the tool makes the gate visible. Anthropic could match this defaulting in a minor version. xAI made it the shipping shape.

What changes for the AI-native operator

If you are running Claude Code today for production work, the honest answer is: nothing immediate. Anthropic still leads on model quality for code, Claude Code has more ecosystem (more plugins, more public skills, more MCP servers in the wild), and switching cost is real even when the conventions are the same.

What changes is the cost of hedging. Six months ago, testing a second coding agent meant rewriting your conventions file, rewiring your hooks, learning a new plugin format, and probably installing a different MCP server marketplace. Today, with AGENTS.md and MCP and Skills shared across the ecosystem, the test looks like this:

# install Grok Build
curl -fsSL https://x.ai/cli/install.sh | bash

# run it in your existing Claude Code repo
cd ~/Projects/your-actual-work
grok
# then switch to Plan Mode in the TUI

# Grok should discover your AGENTS.md, MCP servers, Skills, hooks,
# and plugins with little or no rewiring.

That is not a re-platforming. That is a model swap. The CLI is now the boring layer.

Decision framework for an operator already on Claude Code:

Stay on Claude Code as primary if your workflow depends on the latest Anthropic models, mature plugin ecosystem, or the specific quality profile of Claude Opus on hard reasoning. This is most operators today.

Run Grok Build as a hedge / second opinion if you already pay SuperGrok Heavy, you have repos where 2M context matters, or you want to A/B test the same task across two top-tier models. The Two Models, One Branch pattern generalises: Grok writes, Claude reviews. Or the other way around.

Switch entirely only if your domain genuinely benefits from the Grok 4.3 Heavy profile (still TBD), if you are deeply embedded in the X ecosystem, or if cost economics on a particular workload favour SuperGrok over Anthropic API pricing. Most readers will not be here for at least one more model cycle.

Eight hot takes

1. The surface war is over. The model war is on. Every CLI from here forward will implement AGENTS.md, MCP, and Skills, because not doing so is opting out of an ecosystem with 60,000 repos and 170 corporate members. The differentiation moves up the stack to model quality, plan execution discipline, and verification.

2. xAI made the right move standardising. They could not afford a two-year ecosystem build. Adopting the Linux Foundation stack got them to feature parity on day one and let them spend their engineering on the model. This is the same reason browser vendors implement Web standards: differentiation lives where the standards do not, not where they do.

3. SuperGrok Heavy at $299 per month is the most expensive entry point in the agent CLI market right now. Anthropic gives Claude Code to everyone with API access. OpenAI includes Codex across ChatGPT Plus and Pro plans, with higher usage on Pro tiers. That is subscription packaging as much as product strategy.

4. Raw context is easy to overvalue in software engineering. The hard problems are not solved by feeding more code into the window. They are solved by better retrieval into the window, better planning over the retrieved scope, and better verification of what comes out. Context is necessary; it has stopped being the bottleneck.

5. Foregrounding Plan Mode is the kind of small product decision that compounds. The bet is that users skip planning unless the tool makes the gate visible. Making the gate first-class is a quality win. Anthropic should match this in a minor release.

6. The talent exodus story matters less than the press makes of it. TechCrunch, citing The Information, reported more than 50 researchers and engineers have left since the February 2026 SpaceX-xAI merger, including leaders across coding, world models, and Grok voice. That xAI still shipped Grok Build with this much surface parity in three months is the actual signal: open standards dramatically reduce the team size needed to ship credible interface parity.

7. The most interesting product question is not Anthropic versus xAI versus OpenAI. It is whether the next coding agent CLI worth using comes from a model lab at all. If the surface is standardised and any model is pluggable, somebody is going to ship a CLI that lets you pick the model per task. The first lab-agnostic CLI built on AGENTS.md plus MCP plus Skills wins the long tail of operators who do not want to marry a single vendor.

8. Plan Mode is not a beginner feature. It is how experienced teams reduce risk. Making the gate default trains operators into the right rhythm without lecturing them.

Closing

Grok Build is a credible product. It is not a clone in the cynical sense. It is xAI shipping the obvious right thing on the obvious right standards, with their model underneath. The terminal agent is becoming a scheduler. The conventions file is becoming portable infrastructure. The moat is moving.

The fight is now about model quality, verification, ecosystem depth, and trust. Anthropic still leads on three of those four. xAI is betting that the fourth one (raw model capability at long context with heavy reasoning) eventually carries the rest.

For an operator already running Claude Code, the change today is small: a new CLI you can hedge with for one line of install. For an operator deciding from scratch, the standards mean the choice is reversible. Pick the one whose model you trust most for your work. The CLAUDE.md plus AGENTS.md plus MCP plus Skills you write today is portable. Bet on the agent-native operating layer, not on the terminal app inhabiting it this quarter.

The interface is now a standard. The model is the bet. That is a much healthier shape for the industry than the alternative was.


📚
Related Reading

Above the Model. The components above the model decide AI-native output quality. Context, dreaming, sycophancy mitigation, verification.

Two Models, One Branch. The simplest multi-model orchestration that works. Grok Build slots into this pattern as the second reviewer.

MCP, From Pidgin to Protocol. Why the MCP standardisation matters and what it changed about how agents speak to systems.

Claude for Legal Is Claude Code Wearing a Suit. Companion piece on vertical-skinned CLI agents and the stack folding inward.
💬
Working with a team that wants to adopt AI-native workflows at scale? I help engineering teams build this capability — workflow design, knowledge architecture, team training, and embedded engineering. → AI-Native Engineering Consulting