The Field That Schedules Your World

A hundred-hour descent into constraint programming. CP-SAT, a Kotlin DSL, MiniZinc, benchmarks against three engines, and a full nurse-scheduling application. Why the quietest branch of AI still runs the world.

Every time a nurse walks into a hospital, an airline dispatches a crew, a delivery truck leaves a depot, or a university publishes its exam timetable, a constraint solver has quietly made a decision that affects someone's life. This is the oldest, quietest, and strangest branch of AI — and most software engineers have never touched it.

I spent a hundred hours descending into it.

ELI5: the Sudoku shape of the world

You know Sudoku. Nine rows, nine columns, nine 3x3 boxes, each digit 1-9 appearing exactly once in each. Give someone a half-filled grid and they don't try every possible arrangement — 9⁸¹ is a universe's worth of possibilities. They look at one empty cell, see which digits can still legally go there given its row, column, and box, and start narrowing things down. When they fill one cell, it propagates: other cells now have fewer legal options too. They keep narrowing until the grid is solved or they hit a wall and have to guess.

That is constraint programming in a sentence. You describe what must be true (each row contains 1-9, each column contains 1-9, each box contains 1-9). A solver figures out what arrangement satisfies the rules. You don't write the search algorithm. You don't write a for-loop. You describe the universe of the problem and let the solver inhabit it.

Now scale up. The problem isn't Sudoku but scheduling 200 nurses across 28 days and 4 shifts, with labor laws about rest periods, required skill coverage per shift, vacation requests, contract hours, and fairness constraints. And just to make it interesting, you want the best legal schedule, not just any legal one.

Welcome to the Nurse Scheduling Problem. Proven NP-hard in 1976. A modern constraint solver will chew through an instance involving 2⁽²⁰⁰ˣ²⁸ˣ⁴⁾ ≈ 10⁶⁷⁰⁰ possible assignments before breakfast.

The stumbling

I stumbled into all of this the way you stumble on a trapdoor in an old library — I was reading about something else and noticed a floorboard that rang hollow. The thing that rang hollow was this: most software engineers I know, myself included, have spent their careers assuming "AI" means neural networks. Gradient descent. Backprop. Transformers. That has been the cultural wave for the last eight years.

But there is an entire parallel tradition that predates deep learning by four decades, still ships in production at Google, Amazon, and every airline you have ever flown, and runs on completely different principles. It is called constraint programming. Some people call it "symbolic AI" or "GOFAI" and say it with a slight air of dismissal.

They are wrong. This stuff still runs the world. It just doesn't have a publicist.

The vocabulary you need, in three words

Variables. Symbols that take values from a set. x ∈ {1..9}. start_time ∈ {0..1440 minutes}. is_nurse_alice_working_monday_night ∈ {0, 1}.
Domains. The set of allowed values for each variable. Starts large, shrinks as the solver infers facts.
Constraints. Relations that must hold. x ≠ y. sum(nurses_working_monday_night) ≥ 3. end_of_shift_A + 11h ≤ start_of_shift_B.

That's the whole thing. You write a model that says "here are the variables, here are the constraints, find me an assignment." No loops. No search. Just the universe.

Modern solvers like Google's CP-SAT (the C in OR-Tools stands for constraint) then hit this description with a terrifying machinery: tree search, SAT-style clause learning, specialized propagators for global constraints like AllDifferent and Cumulative, LP relaxations for numeric bounds, and restart heuristics that wouldn't look out of place in a chess engine. The search space for a mid-sized scheduling problem is 10⁶⁰⁰⁰. The solver finds the optimum in 0.2 seconds.

It's absurd. It feels like cheating. I needed to understand how it works.

So I built a repo

cp-deep-dive is what a hundred hours inside a constraint solver looks like. It isn't a tutorial blog. It isn't a "hello world" collection. It is a working apprenticeship: eighteen chapters, two languages, three solvers, a production-grade nurse-scheduling application, and an agentic knowledge base that indexes every word of it.

The thesis of the repo: you do not learn a field by reading papers; you learn it by modelling things with your own hands, in two languages, against real benchmarks, until the constraints stop surprising you.

Let me give you the tour.

The dual-language rationale

Everything lives in both Python and Kotlin. Most "learn X" repos pick one language. I picked two for a specific reason: the cleanest way to understand a solver's conceptual model is to see it expressed in two very different host languages and notice what is essential versus what is just ceremony.

Python is where CP-SAT's first-party bindings live. It is also where most of academia writes. Clean, dynamic, fast to prototype.

Kotlin is where the ceremony gets interesting. The raw OR-Tools Java API is verbose — designed for Java 6. So the repo contains cpsat-kt, a first-class Kotlin DSL library that wraps OR-Tools and makes it feel native. Here is the entire model for a small optimization problem:

val model = cpModel {
    val x = intVar("x", 0..10)
    val y = intVar("y", 0..10)

    constraint {
        +(3 * x + 2 * y eq 12)
        +(x + y le 5)
    }

    minimize { x + y }
}

val result = model.solveBlocking { randomSeed = 42 }
// result is SolveResult.Optimal(objective = 4L, values = {x=4, y=0})

That reads like math. Operator overloading for +, *, eq, le. A DSL block for constraint posting. A typed result ADT (Optimal | Feasible | Infeasible | ModelInvalid). Under the hood it is doing all the model.addLinearExpressionInDomain(...) gymnastics the raw Java API demands.

I have argued before that human language is the best programming language and that a DSL is a programming language with no compiler. This library is the living proof. 41 tests, all green, versioned, and on track to publish as a real artifact.

MiniZinc as Rosetta Stone

Halfway through, the repo takes a detour into MiniZinc — a solver-agnostic declarative modelling language. Why? Because MiniZinc forces you to write the model in pure mathematical form, with no reference to any particular solver. You describe the problem. MiniZinc compiles it down to FlatZinc and hands it to whichever backend you want: CP-SAT, Gecode, Chuffed, or even an MILP solver.

This is the Rosetta Stone moment. Once you have written a model in CP-SAT Python, then again in cpsat-kt Kotlin, then again in MiniZinc, you start to see which parts are the problem and which parts are just the language. The repo uses MiniZinc as a prototyping tool — write it declaratively first, debug at the math level, then port to CP-SAT in production.

The Nurse Scheduling Problem as canonical teacher

Five of the eighteen chapters are the NSP. Not because I have any nursing background — I don't. I chose it because it is the canonical real-world CP scheduling problem: well-studied since the 1970s, has standard benchmark datasets (NSPLib, INRC-I, INRC-II), and maps cleanly onto CP-SAT's Boolean and interval machinery.

The core decision variable is disarmingly simple:

# x[(nurse_id, day, shift_id)] = 1 iff the nurse works that shift on that day
for n in nurses:
    for d in days:
        for s in shift_ids:
            if _cell_allowed(instance, n.id, d, s):
                x[(n.id, d, s)] = model.new_bool_var(f"x_{n.id}_d{d}_{s}")

One Boolean per (nurse, day, shift) triple. For 50 nurses x 28 days x 4 shifts, that is 5,600 Booleans. The constraints then layer on top:

HC-1 coverage — for each shift on each day, between min and max nurses must be assigned.
HC-2 one shift per day — sum(x[(n, d, s)] for s in shifts) ≤ 1 for each nurse/day.
HC-3 forbidden transitions — can't work night-then-morning, enforced pairwise.
HC-4 max consecutive working days and HC-5 max consecutive nights — rolling window over any k-day slab.
HC-6 minimum rest hours — derived from shift end times, folded into forbidden transitions.
HC-7 skill coverage — at least one nurse with skill X per shift requiring X.
HC-8 contract hours — rolling 7-day window of shift durations, bounded by contract ± tolerance.

That's the hard constraints. Then come soft constraints (preferences, fairness, workload balance) expressed as penalty terms in an objective function. The repo walks through them in escalating complexity over three chapters. Master NSP and you can model most scheduling problems in the wild.

Benchmarks that tell the truth

The repo ends with an ecosystem chapter that benchmarks CP-SAT against Choco (classic Java CP solver, pure propagation + backtracking) and Timefold (formerly OptaPlanner, local-search metaheuristic engine). Same instances, same constraints, different engines. Here is the v0.1 baseline on two toy instances:

Instance	Solver	Status	Objective	Wall time
toy-01	CP-SAT	optimal	3.0	0.006 s
toy-01	Choco	optimal	0.0*	0.017 s
toy-01	Timefold	feasible	1.0	15.006 s
toy-02	CP-SAT	optimal	20.0	0.225 s
toy-02	Choco	optimal	0.0*	2.498 s
toy-02	Timefold	feasible	1.0	15.006 s

* Choco port currently models the two toys as pure feasibility, not minimization — kept in the baseline on purpose to show the modeling gap between engines.

CP-SAT dominates on small instances because its hybrid CP+SAT+LP core shreds through them. Timefold's local search pulls ahead on instances too large for tree search to complete. Choco sits between them with beautifully legible code but no clause-learning engine. There is no One True Solver. There is matching the tool to the problem shape.

The spec-first discipline

Before any backend code existed in the apps/ directory, I wrote and locked ten specification documents under specs/nsp-app/:

00-overview.md
01-vision-and-goals.md
02-domain-model-and-glossary.md
03-functional-requirements.md
04-non-functional-requirements.md
05-architecture.md
06-api-contract.md           <- references apps/shared/openapi.yaml
07-data-contracts.md
08-ui-flows.md
09-acceptance-criteria.md

The spec was frozen at v1.0. Then the code was written to conform to it. The OpenAPI contract under apps/shared/openapi.yaml was the authority: both the Python FastAPI backend and the Kotlin Ktor backend had to match it, endpoint for endpoint, shape for shape, enum value for enum value.

This is the discipline I wrote about in Spec-Driven Agentic Development. When humans work with AI agents on complex projects, the spec is what keeps both the human and the agent honest. Without a locked spec, every conversation drifts. With one, every pull request is measurable: does this match the spec, yes or no. It is also exactly the pattern I outlined in The Architect's Protocol — humans design, agents execute, specs are the contract.

Twin backends, one contract

Identical OpenAPI contract. Two implementations.

apps/py-api/ — FastAPI + SQLModel + sse-starlette + Pydantic v2. Thirteen endpoints. Thirty unit tests.
apps/kt-api/ — Ktor 3.x + Exposed + kotlinx.coroutines. Thirteen endpoints. Seventeen unit tests.

Same API. Same behavior. Same JSON over the wire. You can swap one for the other transparently from the frontend. This is not overengineering — it is how you verify that your model is language-agnostic. If a bug appears in the Python backend but not the Kotlin one, it is an implementation bug. If it appears in both, it is a model bug. The twin structure is a debugging oracle.

The frontend — apps/web/, built on Vite + React 19 + React Router v7 (framework mode) + Tailwind 4 + shadcn/ui + TanStack Query 5 — doesn't know which backend it's talking to. It just speaks OpenAPI. Server-Sent Events stream solve progress to the UI live. Watch the solver chew through branches in real time. It is hypnotic.

The eighteen chapters

The path from "what is a constraint" to "benchmark a production solver against three engines" is eight phases:

Phase 1   Intuition + vocabulary           ch. 1-3
Phase 2   Core modelling                   ch. 4-6
Phase 3   Bridges (MiniZinc to CP-SAT)     ch. 7-8
Phase 4   Scheduling primitives            ch. 9-10
Phase 5   Nurse Scheduling I / II / III    ch. 11-13
Phase 6   Your own DSL library             ch. 14
Phase 7   End-to-end NSP application       ch. 15-17
Phase 8   Ecosystem port                   ch. 18

Every chapter has the same shape: intuition, then formal math, then Python implementation, then Kotlin implementation, then MiniZinc variant (where it adds clarity), then exercises. You read it, you run it, you break it, you fix it, you move on.

The agentic knowledge base

Every area of the repo has an overview.md that acts as an always-loaded anchor for Claude Code, plus a CLAUDE.md that encodes workflow rules. A QMD MCP server indexes every markdown file in the repo into a searchable vector + BM25 database. A post-commit git hook re-indexes after every commit so queries never go stale. A launchd daemon keeps the indexer running.

The effect: you can ask the repo a question in natural language and get a cited answer drawn from any chapter, any spec, any benchmark, any ADR. The knowledge base builds itself as you write. And because every CLAUDE.md encodes the local workflow rules, the agent never drifts out of scope.

This is what modern agentic development actually looks like: not "prompt and pray," but a structured protocol where humans write specs, agents write code that conforms to specs, and a searchable knowledge base keeps both parties grounded in ground truth. If you want to set up this kind of environment yourself, I wrote Configure Claude Code for Maximum Power with the config I actually use. The verification gap between prototype and production — the thing tests and benchmarks exist to close — is the same gap every one of those 18 chapters is built to teach you how to close by hand.

PhD corner: the machinery under the hood

If you want to descend past the API surface, here is what is actually happening inside CP-SAT.

Propagation. For every constraint you post, the solver installs a propagator — a little specialized algorithm that knows how to shrink variable domains when other variables' domains shrink. Post AllDifferent([x, y, z]); when x becomes 3, the propagator removes 3 from y and z immediately. No search needed. Propagation does 99% of the work; search only happens in the gaps propagation can't close.

Clause learning. When the solver backtracks from a failure, it analyzes why the failure happened and synthesizes a new constraint — a "learned clause" — that prevents re-entering the same failed region of the search tree. This is imported directly from modern SAT solvers (CDCL, conflict-driven clause learning). It is the single biggest reason CP-SAT buries pre-2010 CP solvers on hard instances.

Lazy clause generation (LCG). The deepest trick. Instead of picking one representation (SAT clauses or CP domains), the solver maintains both simultaneously and translates between them on demand. When propagation infers a fact in CP-land, it lazily generates the SAT clauses that explain the fact — which then feed the clause learner. This is why CP-SAT is called a hybrid solver: it is literally a CP engine and a SAT solver sharing a belief state.

Symmetry breaking. Scheduling problems are riddled with symmetries — swap any two identical nurses and you get an equivalent solution, and a naive solver would waste time exploring both. You add symmetry-breaking constraints that force a canonical ordering, pruning the duplicates before search begins.

Global constraints. AllDifferent, Cumulative, NoOverlap, Circuit, Element — these are not syntactic sugar. They have dedicated O(n log n) or polynomial propagators that do inference thousands of times more cheaply than the equivalent decomposition into binary constraints. Modelling with global constraints is how you make a 10⁶⁰⁰⁰ search space tractable.

Sixty years of algorithmic cleverness compounded into a single hybrid engine. The repo covers these mechanisms over chapters 4-10 with working examples of each.

Why this matters

Every production system that makes discrete decisions at scale — flight crew scheduling, hospital rostering, warehouse picking, vehicle routing, manufacturing line balancing, exam timetabling, satellite tasking, sports fixtures — runs on some variant of this machinery. The tradition predates neural networks. It coexists with them (hybrids are an active research front). It solves problems that cannot be solved by gradient descent because the solution space is discrete and the constraints are hard, not soft.

It is, in the most literal sense, the field that schedules your world.

And almost no one talks about it.

Go dig

The repo is github.com/vpetreski/cp-deep-dive. Public, Apache-2.0-licensed, still in active development: the v0.1 scaffold is complete and the chapters are filling in. If you are the kind of person who enjoys the feeling of watching a solver narrow a 10⁶⁰⁰⁰ space into a single provably-optimal answer in 200 milliseconds, you will enjoy the trapdoor I went through.

Start with docs/overview.md. Then docs/chapters/01-what-is-cp.md. Then keep going. The whole thing is designed to be readable top-to-bottom in about a hundred hours. Put in the time and you come out the other side fluent in a branch of AI most engineers don't know exists — and holding a Kotlin DSL you can ship to production.

The hollow floorboard is just sitting there. Pull it up.

The Field That Schedules Your World

ELI5: the Sudoku shape of the world

The stumbling

The vocabulary you need, in three words

So I built a repo

The dual-language rationale

MiniZinc as Rosetta Stone

The Nurse Scheduling Problem as canonical teacher

Benchmarks that tell the truth

The spec-first discipline

Twin backends, one contract

The eighteen chapters

The agentic knowledge base

PhD corner: the machinery under the hood

Why this matters

Go dig

Related Reading

Read more

Three Months In

The Cost Control Plane

678 Korean BBQ

Smooth Skin & the Colombian-Arab Chef

ELI5: the Sudoku shape of the world

The stumbling

The vocabulary you need, in three words

So I built a repo

The dual-language rationale

MiniZinc as Rosetta Stone

The Nurse Scheduling Problem as canonical teacher

Benchmarks that tell the truth

The spec-first discipline

Twin backends, one contract

The eighteen chapters

The agentic knowledge base

PhD corner: the machinery under the hood

Why this matters

Go dig

Related Reading

Sign up for Vanja Petreski

Read more

Three Months In

The Cost Control Plane

678 Korean BBQ

Smooth Skin & the Colombian-Arab Chef