Spec-Driven Agentic Development

Stop feeding vague ideas to AI agents. Use OpenSpec to collect scattered requirements, synthesize them into structured specs, review as a team, freeze, then build. The framework that eliminates rework.

Spec-Driven Agentic Development — Artificial Intelligence

You have a feature to build. The requirements are in a Linear ticket. Some context is buried in a Slack thread from two weeks ago. There are notes from a meeting you half-remember. A teammate mentioned an edge case in a standup. Your tech lead has opinions about the API design that live exclusively in their head.

Now you sit down, open Claude Code, and type: "Build the notification system."

What you get back will be technically impressive and fundamentally wrong. Not because the AI is bad — but because your input was scattered, incomplete, and ambiguous. I wrote about this in The Verification Gap: vague input, vague output. It’s the single most expensive mistake in AI-assisted development, and most teams make it every day.

There’s a better way. It’s not a new AI model. It’s a discipline backed by a framework.

Write the spec first. Review it as a team. Freeze it. Then build. And with OpenSpec, this becomes structured, repeatable, and agent-native.


The Scattered Input Problem

Every feature starts as distributed knowledge. Requirements live in different places, in different formats, owned by different people:

  • Linear/Jira: The ticket — usually a title and two sentences
  • Slack: Three threads, two channels, one DM with the actual decision
  • Meetings: Verbal agreements nobody wrote down
  • Your head: Technical constraints you know from experience
  • Someone else’s head: Edge cases, business rules, compliance requirements
  • Figma/Docs: Design mockups that may or may not match the latest decisions

This is normal. Requirements should emerge from conversations and collaboration. The problem isn’t that the information is scattered — it’s that we skip the step of assembling it into a coherent specification before we start building.

Instead, we go straight from scattered inputs to code. We treat the AI agent like a mind reader. And then we spend three days iterating on code that should never have been written in the first place — because the foundation was wrong.


Two Approaches

The traditional approach:

Scattered info → Vague prompt → Code → Review → "That's not what I meant" → Rewrite → Repeat

You iterate on code. Every cycle is expensive: the AI rewrites files, tests break, reviewers re-read everything, context shifts. Three rounds of code review later, someone realizes the API contract was wrong from the start.

The spec-driven approach:

Scattered info → Spec draft → Review spec → Iterate spec → Freeze spec → Build from spec → Done

You iterate on Markdown files. Cheap. Fast. No code to rewrite. No tests to fix. No merge conflicts. When the spec is right — when everyone agrees on exactly what to build — you hand it to the agent and get it right the first time.

Code refactors cost hours. Text edits cost seconds. The shift is simple: move the iteration from code to spec.


Enter OpenSpec

OpenSpec is an open-source spec framework designed as a planning layer for AI coding agents. It takes the spec-driven philosophy and gives it structure, repeatability, and direct agent integration.

What it provides:

  • Structured artifacts — proposal, design, specs, and tasks as separate Markdown files
  • Capability-based organization — specs organized by what the system does, not by ticket number
  • Change management — each change proposal lives in its own directory with all context
  • Agent commands/opsx:propose, /opsx:apply, /opsx:archive integrated into coding agents
  • Git-native — everything lives in the repo, version-controlled alongside the code
  • Agent-agnostic — works with 30+ coding agents (Claude Code, Cursor, GitHub Copilot, Codex, and more)

No API keys. No external dependencies. Just structured Markdown files in your repo.


The Directory Structure

openspec/
├── specs/                              # Living capability specs (persistent)
│   ├── notification-delivery/
│   │   └── spec.md                     # What the system does right now
│   ├── user-permissions/
│   │   └── spec.md
│   └── payment-processing/
│       └── spec.md
├── changes/                            # Active change proposals
│   ├── add-notification-batching/
│   │   ├── .openspec.yaml              # Change metadata
│   │   ├── proposal.md                 # What & why
│   │   ├── design.md                   # How (technical decisions)
│   │   ├── tasks.md                    # Implementation checklist
│   │   └── specs/
│   │       └── notification-delivery/
│   │           └── spec.md             # Spec delta (what changes)
│   └── archive/                        # Completed changes
│       └── 2026-03-26-add-notification-batching/
│           └── ...

Three directories, three purposes:

  • openspec/changes/<name>/ — Active proposals. Each change gets its own directory with four artifacts. This is where spec work lives during proposal and review.
  • openspec/specs/ — The living truth. Describes what the system does right now. Organized by capability, not by ticket. Populated via /opsx:archive when changes are completed.
  • openspec/changes/archive/ — Completed changes, moved here with a date prefix. Full history of proposals, designs, and tasks alongside the code that implemented them.

The separation is deliberate. Active work goes in changes/. The source of truth lives in specs/. The archive preserves history. At any point, anyone can look at openspec/specs/ and know exactly what the system does today.


The Four Artifacts

Every change proposal in OpenSpec produces four files. Each serves a distinct purpose, and together they form a complete, reviewable description of a change before any code is written.

1. Proposal (proposal.md) — The Why

Concise. One to two pages max. Establishes motivation without getting into implementation details.

## Why

The notification system sends emails one-at-a-time. At scale, this creates
delivery delays of 30+ minutes during peak hours. Users report receiving
notifications long after the triggering event, reducing trust in the platform.

## What Changes

- Add a batching layer that groups notifications by recipient and channel
- Implement configurable batch windows (default: 5 minutes for email, immediate for push)
- Add a digest template that combines multiple notifications into a single email
- Update the delivery pipeline to route through the batch queue

## Capabilities

### Modified Capabilities
- `notification-delivery`: Add batching and digest support

### New Capabilities
- `notification-batching`: Configurable batch windows and digest generation

## Impact

- Delivery pipeline: new batch queue before email dispatch
- Database: new batch_config table, notification_batch junction table
- Templates: new digest email template
- API: new endpoint for batch configuration

The proposal captures what and why without prescribing how. The “Capabilities” section creates the contract between proposal and specs — it names exactly which capabilities are new or modified.

2. Design (design.md) — The How

Technical decisions with rationale. This is where you make choices explicit and debatable before anyone writes code.

## Decisions

### D1: In-process batching, not external queue
Use an in-process timer-based accumulator rather than Redis/SQS. Current
volume (< 10k notifications/hour) doesn't justify external infrastructure.
If we cross 50k/hour, revisit this decision.

### D2: Batch window is per-channel, not per-recipient
Email: 5-minute window (batch). Push: immediate (no batch).
SMS: 15-minute window. Per-recipient batching adds complexity
without clear user benefit at current scale.

### D3: Digest template uses summary format, not list
Instead of listing every notification, the digest shows a summary
("3 new comments on your post") with a deep link. Reduces email
length and improves click-through based on industry benchmarks.

## Risks / Trade-offs
- Batch window adds latency to email delivery by design
- In-process accumulator loses pending batches on restart (acceptable: retry on next cycle)
- Digest format loses individual notification detail (users can click through)

Each decision has a number (D1, D2, D3) so reviews can reference them precisely: "I disagree with D1 because..." instead of "somewhere in the design you said..." This alone eliminates half the confusion in design review threads.

3. Spec (spec.md) — The Contract

Requirements in SHALL format. Scenarios in GIVEN/WHEN/THEN. This is the frozen contract the implementation must satisfy.

## Purpose

Configurable notification batching that groups notifications by recipient
and channel, delivering digests instead of individual messages.

## Requirements

Requirement: The system SHALL accumulate notifications per recipient per channel
during the configured batch window before dispatching.

Requirement: The system SHALL dispatch push notifications immediately regardless
of batch configuration, preserving real-time behavior for mobile.

Requirement: The system SHALL generate a digest email when a batch contains
more than one notification, summarizing all notifications with deep links.

Requirement: The system SHALL flush all pending batches on graceful shutdown,
ensuring no notifications are silently dropped.

## Scenarios

Scenario: Email batching within window
  GIVEN batch_window for email is 5 minutes
  AND user receives notification A at T+0
  AND user receives notification B at T+2min
  WHEN the batch window expires at T+5min
  THEN one digest email is sent containing both A and B

Scenario: Push notification bypasses batching
  GIVEN batch_window for push is 0 (immediate)
  AND user receives a push notification
  WHEN the notification enters the pipeline
  THEN it is delivered immediately without batching

Scenario: Single notification in batch
  GIVEN batch_window for email is 5 minutes
  AND user receives only notification A during the window
  WHEN the batch window expires
  THEN a standard (non-digest) email is sent for notification A

Scenarios are executable specifications. They tell you exactly what to test. An agent implementing this knows precisely what “correct” looks like. A reviewer can check: does every scenario have a corresponding test? Does every test pass?

4. Tasks (tasks.md) — The Plan

Checkboxes. Ordered. Each task is a unit of work the agent can execute independently.

## Tasks

### Core batching engine
- [ ] Create BatchAccumulator class with configurable window per channel
- [ ] Add flush_on_shutdown hook to ensure no notifications are lost
- [ ] Wire BatchAccumulator into the notification dispatch pipeline

### Digest generation
- [ ] Create DigestBuilder that merges notifications into summary format
- [ ] Add digest email template with notification count and deep links
- [ ] Route batched notifications through DigestBuilder before dispatch

### Configuration
- [ ] Add batch_config table (channel, window_seconds, enabled)
- [ ] Create migration with default config (email: 300s, push: 0, sms: 900s)
- [ ] Add API endpoint GET/PUT /api/notifications/batch-config

### Tests
- [ ] Test batch accumulation: two notifications within window produce one digest
- [ ] Test push bypass: push notifications dispatch immediately
- [ ] Test single-notification batch: sends standard email, not digest
- [ ] Test shutdown flush: pending batches dispatch on graceful stop
- [ ] Verify all existing notification tests pass unchanged

When you run /opsx:apply, the agent works through these tasks sequentially, checking each off as it goes. The tasks are derived from the design decisions and spec requirements — nothing is invented on the fly.


The Workflow

Step 1: Propose

/opsx:propose notification-batching

The agent gathers context from your codebase, existing specs, and any input you provide (ticket, Slack thread, meeting notes), then generates all four artifacts. You can also write them manually — OpenSpec doesn’t force you through a CLI.

This is the synthesis step. You dump all your scattered information, and the agent produces a structured, coherent proposal. It identifies gaps. It surfaces decisions you haven’t made. It turns implicit assumptions into explicit statements.

Step 2: Draft PR (Spec Only)

Create a pull request containing only the spec artifacts. No implementation code. Mark it as draft.

git checkout -b spec/notification-batching
git add openspec/changes/add-notification-batching/
git commit -m "RFC: Notification batching"
gh pr create --title "RFC: Notification batching" --draft

The RFC: prefix makes it instantly clear: this is a spec review, not a code review. Different reviewers, different expectations, different review criteria.

This is the most reviewable PR your team will ever see. A reviewer doesn’t need to understand code paths, trace execution flows, or run tests. They read a document and answer one question: "Is this what we should build?"

Step 3: Multi-Pass Review

This is where both humans and agents provide input. Multiple eyes, multiple passes:

Human reviewers catch:

  • Business logic errors (“We also need to handle enterprise tier rate limits differently”)
  • Missing requirements (“What about GDPR? Users need to opt out of digest emails”)
  • Scope creep (“Let’s not build the analytics dashboard in v1”)
  • Organizational context the agent can’t know

AI agent reviewers catch:

  • Technical inconsistencies (“Design D2 contradicts requirement 3”)
  • Missing edge cases (“What happens if the batch queue is full on restart?”)
  • Security gaps (“The batch config endpoint needs authentication”)
  • Numbered decision references make feedback precise

You can even have the agent review the spec from a different angle:

Review openspec/changes/add-notification-batching/ as a senior backend engineer.
Focus on: scalability, failure modes, and operational concerns.
What's missing? What will break at scale? What will wake someone up at 3am?

Each perspective surfaces different gaps. Each pass makes the spec tighter. And each iteration costs minutes, not hours, because you’re editing Markdown — not rewriting code.

Step 4: Freeze

When the team approves the PR, merge it. The spec is now frozen. It’s the contract. All scope decisions and edge case handling are locked. If requirements change later, you update the spec first. The spec leads, the code follows.

Step 5: Implement

Create a new branch and PR for implementation:

/opsx:apply

The agent reads the proposal, design, specs, and tasks, then implements each task with full context. It doesn’t guess intent — it works from an explicit, reviewed contract. The difference is night and day compared to “build the notification system.”

Step 6: Archive

After the implementation merges:

/opsx:archive

This does two things:

  1. Syncs spec deltas from openspec/changes/<name>/specs/ into openspec/specs/ — updating the living capability specs to reflect the new behavior
  2. Moves the change to openspec/changes/archive/YYYY-MM-DD-<name>/ — preserving the full proposal, design, and task history

The result: openspec/specs/ always reflects the current state of the system, and the archive provides an audit trail of every change that shaped it.


Why This Works

Structure Reduces Ambiguity

The four-artifact format forces completeness. You can’t skip the “why” (proposal), you can’t skip the “how” (design), you can’t skip the contract (spec), and you can’t skip the plan (tasks). Each artifact builds on the previous one. The proposal unlocks design and specs, which unlock tasks.

Capability-Based Organization Scales

Organizing specs by capability (notification-delivery, user-permissions) instead of by ticket (ALT-8, JIRA-123) means specs outlive sprints. When a new developer joins or a new ticket references notification behavior, they find openspec/specs/notification-delivery/spec.md — not a six-month-old ticket buried in Linear.

Scenarios Are Executable Verification

GIVEN/WHEN/THEN scenarios translate directly to test cases. The agent implementing the spec generates tests that verify the scenarios. Reviewers verify the implementation by checking: does every scenario have a corresponding test? Does every test pass? The verification gap closes.

Iteration is Cheap

Rewriting a paragraph in a Markdown file takes seconds. Rewriting a service takes hours. When you iterate on the spec, every correction is a text edit. When you iterate on code, every correction is a refactor, a test update, a migration change, and a re-review.

Alignment Happens Before Execution

The most expensive bug in software is a misunderstanding. “That’s not what I meant” costs more than any runtime error. Spec review forces the team to align before anyone writes code. Disagreements surface at the cheapest possible stage.

Agents Get Better Input

An AI agent with a frozen spec produces dramatically better code than an agent with a Slack thread. The spec eliminates ambiguity. The agent doesn’t have to guess what you meant — it’s written down, reviewed, and approved. Better input, better output. Every time.

Git-Native Means No Drift

Everything lives in the repo. No Confluence pages that drift from code. No Notion docs that nobody updates. The spec is versioned alongside the code it describes. git blame tells you when a requirement changed and why.

The Spec Is Documentation

When the feature ships, the spec stays. Six months from now, when someone asks “Why does the notification system batch emails every 5 minutes?” the answer is in openspec/specs/notification-delivery/spec.md, with the archived proposal that explains the reasoning.


Common Objections

"This slows us down"

Spec writing takes an hour. Spec review takes a day. But it eliminates the 60% of time typically spent iterating wrong implementations back and forth. Total delivery time goes down, not up.

"Requirements change constantly"

Good. Change the spec first, then change the code. The spec makes the change visible and reviewable before anyone touches implementation. This is faster than discovering misalignment in a code review.

"Isn't this just waterfall?"

Waterfall specs are 100-page documents reviewed over months. OpenSpec artifacts are 1–2 pages each, reviewed in a day, for a single change. Ship weekly. Iterate constantly. The loop is tight.

"My agent is smart enough to figure it out"

Your agent is smart enough to build exactly what you describe. The question is whether what you described is what you actually need. Specs make your description precise enough that "what you described" and "what you need" converge.

"The spec will get outdated"

Not with OpenSpec’s archive workflow. When you run /opsx:archive, the living specs in openspec/specs/ are automatically updated from the change’s spec deltas. The rule is simple: if the code changes, the spec changes first. And the tooling enforces the sync.


Setting It Up

Installation

npm install -g @fission-ai/openspec@latest
cd your-project
openspec init

This creates the openspec/ directory structure and configures your agent. Works with Claude Code, Cursor, GitHub Copilot, and 30+ other tools out of the box.

Agent Instructions (CLAUDE.md)

Add the spec-first rule to your project’s CLAUDE.md:

### Spec-First for Complex Changes

Any ticket that touches core business logic MUST follow spec-first development.

Process:
1. /opsx:propose — Generate proposal, design, spec, and tasks
2. Draft PR — Spec-only, assign reviewers
3. Iterate on spec feedback
4. Freeze — Merge spec PR
5. /opsx:apply — Implement against frozen spec
6. /opsx:archive — Sync specs to living docs, archive change

PR Naming Convention

# Spec PRs (draft until frozen)
RFC: Notification batching
RFC: User permission redesign

# Implementation PRs (reference the spec)
feat: Add notification batching (openspec/changes/add-notification-batching)

A Spec-Writer Skill

If you use Claude Code skills, create a reusable skill that wraps the full workflow:

# .claude/skills/spec/SKILL.md
---
name: spec
description: Create a spec-first change proposal using OpenSpec
disable-model-invocation: true
argument-hint: [change-name]
---

Run /opsx:propose for: $ARGUMENTS

After generating the artifacts, review them yourself and ask me
for any missing context before I create the draft PR.

Ensure the proposal has:
- Clear problem statement (why)
- Explicit capability mapping
- Numbered design decisions
- SHALL requirements with GIVEN/WHEN/THEN scenarios
- Ordered task checklist

Now /spec notification-batching triggers the entire spec creation workflow.


Closing the Verification Gap

In The Verification Gap, I argued that the biggest risk in AI-assisted development isn’t bad code — it’s the inability to verify that what the agent built is what you actually wanted.

OpenSpec attacks that gap with surgical precision. The spec is the intention, made explicit. SHALL requirements become assertions. GIVEN/WHEN/THEN scenarios become test cases. Design decisions are numbered and reviewable. The gap between intention and implementation shrinks from “Did the agent understand what I was thinking?” to “Does the code match the spec?” — and that’s a much easier question to answer.

Spec-driven development isn’t complicated. It’s structured Markdown files and a PR review. The hard part is the discipline: resisting the urge to skip straight to code. But every team that adopts this pattern reports the same thing: total feature delivery time goes down. Not because the coding is faster — but because the rework disappears.

Better input, better output. It was always that simple.

Human Language Is the Best Programming Language

💬
Working with a team that wants to adopt AI-native workflows at scale? I help engineering teams build this capability — workflow design, knowledge architecture, team training, and embedded engineering. → AI-Native Engineering Consulting