Sinapt: The Architecture Before the Code

After three Sinapt manifestos, the architecture has answers. Two layers locked: retrieval and contract surfaces. The PoC has not run. Dogfood has not run. Not a victory lap — but the architecture has stopped being hand-wavy where hand-wavy kills the product.

This is research in progress.

The first three Sinapt pieces did the manifesto work: the original synthesis named the cockpit fatigue, the second piece split the knowledge base from the future cockpit, and the third framed the market as the queryable company. Your knowledge is everywhere and queryable nowhere. Agents rebuild context from scratch. Companies pay the knowledge tax because the institutional memory layer does not exist in a form agents can use.

This is not that argument again.

This is the build update after several architecture passes. Two layers are now locked: the retrieval layer and the contract surfaces. The PoC has not run. Dogfood has not run. Nothing here is a victory lap. But the architecture has stopped being hand-wavy in the places where hand-wavy kills the product.

The hard part is not picking a vector database. That is the part everyone wants to argue about because it is legible. Pinecone, Weaviate, Turbopuffer, pgvector, OpenSearch, Qdrant. Fine. Useful debate. Not the core.

The hard part is making a company's knowledge retrievable by agents without leaking what they should not see, while exposing the right contracts to the right callers.

That is the product.

Diagram 1 — ingest path. Pluggable connectors emit canonical markdown into S3, the source of truth. Agent-written feedback from Diagram 2 enters through the same pipeline as another source (yellow-bordered box). From S3 the indexing pipeline — chunker, contextualizer, embedder — writes chunks and vectors into Turbopuffer (the box at the center of Diagram 2).

That single pipeline produces a canonical markdown corpus in S3 and an indexed-and-embedded version inside Turbopuffer. The next diagram shows what happens at query time — the most consequential moment in the whole system, because that is where permissions get enforced.

Diagram 2 — policy and retrieval. The cardinal invariant: Aurora compiles the ACL filter and applies it BEFORE Turbopuffer retrieves anything. Top indicator shows where writes from the Embedder (Diagram 1) enter. Bottom indicator shows where queries from MCP / REST (Diagram 3) enter and where authorized chunks are returned. The yellow feedback loop on the left writes back to S3 in Diagram 1.

Every query crosses Aurora before Turbopuffer sees a single chunk. The reranker only operates on authorized candidates. The audit log records every step. The third diagram shows the four caller-shaped surfaces that originate those queries — and how they all share one authorization spine.

Diagram 3 — query surfaces. Four caller-shaped surfaces, one authorization spine. MCP server and REST go directly up to Aurora's policy compiler (the orange box in Diagram 2); CLI and Web UI call REST under the hood. Authorization is uniform; surface differences are presentation only.

Four surfaces, one policy compiler, no exceptions. The roadmap below maps Layer 2 (search engine) and Layer 5 (contract surfaces) — the two layers locked in this Phase 1 architecture lock-in — against the eight pending layers that complete the full build plan.

The 10-layer architecture roadmap. Layer 2 (search engine) and Layer 5 (contract surfaces) are locked in this Phase 1 architecture lock-in. The eight pending layers — project tools, AWS infrastructure, storage data model, backend services, connectors and workflow, auth and teams, admin Web UI, strategic planning — are still in flight.

Retrieval Quality Is Not an MVP Upgrade

In the second Sinapt post, QMD was still the candidate cloud engine. That was correct for the thinking stage and wrong for the product stage. The architecture has moved.

Sinapt's cloud retrieval layer is now a Floor 4 RAG stack, in the language of RAG, From Crayons to PhD: hybrid retrieval, dense embeddings, reranking, contextualization, evaluation, permissions, and audit as one system. Not a vector search endpoint with a logo.

Quick gloss for the technical terms ahead. RAG = retrieval-augmented generation — the standard pattern of "search a knowledge base, hand the top results to an LLM, let it answer." Floor 4 in the Crayons-to-PhD framework is the 2026 production baseline: hybrid keyword + vector search, plus a reranker on top, plus contextual retrieval at indexing time. Each layer covers a different failure mode. Below Floor 4 is the demo-RAG zone where most teams stall.

The public shape is locked:

Turbopuffer — the search engine. It runs two kinds of search at the same time: classic keyword search (BM25 — the same family of math behind Elasticsearch and ranking in Google's early days, great when the exact word matches) and vector search (semantic similarity — finds chunks that mean roughly the same thing as the query, even with different words). The two rankings get fused with reciprocal rank fusion (RRF) — a simple math trick that combines both lists into one ordered set, so you get the best of keyword precision and meaning-based recall. Namespace-per-tenant means each customer's index is physically isolated — no shared tables, no cross-tenant leakage by accident.
Voyage voyage-3-large — the embedder. Embeddings are how an AI represents text as numbers: each chunk becomes a 1,024-dimensional vector, and two chunks with similar meaning end up close together in that vector space. This is what makes "find chunks that mean roughly the same thing" possible without exact keyword matches. Best-in-class on retrieval benchmarks as of 2026.
Voyage rerank-2.5 — the reranker. After the initial hybrid search returns the top ~100 candidates, the reranker is a second-pass model that re-orders them with deeper understanding — it actually reads the query and each candidate together (a cross-encoder), instead of just comparing precomputed vectors. The result: the genuinely most useful chunk lands at the top of the top-N returned to the agent. Enabled from PoC day 1 — not deferred.
Claude Haiku 4.5 — the indexing-time intelligence. Does two jobs: (1) contextual retrieval — Anthropic's trick of prepending a 50-100 token summary to each chunk before indexing ("this chunk is from the Q2 planning doc, discussing the auth migration timeline"). Cuts retrieval failures by ~49%, or 67% when stacked with the reranker. (2) Policy-gated query rewrite — when the agent's question is vague, Haiku rewrites it into something more retrievable, but only within scopes the caller is allowed to see.
Amazon S3 — the source of truth. Every piece of knowledge ends up as canonical markdown in S3 before it's chunked, embedded, or indexed anywhere. The indexes are derivative — you can rebuild them from S3 anytime, or export the full KB as plain .md files. "Never used for training" is auditable because S3 is the only durable copy and it's encrypted with your KMS key.
Aurora PostgreSQL Serverless v2 — the control plane. The boring database that owns the dangerous data: tenants, users, group memberships, collection ACLs, policy state, audit log. The retrieval engine never owns permissions. Aurora compiles the ACL filter (ACL = access control list — the rules for who can see what) for every query before it touches Turbopuffer.
ECS Fargate, SQS, and EventBridge — the runtime plumbing. Fargate runs the REST + MCP services as containers without managing servers. SQS is the job queue for ingestion work. EventBridge fires events when sources change (a GitHub PR merges, a Linear ticket gets archived, a Granola transcript drops). Same deploy shape for one user and for enterprise — no "scale by rewriting."

That list is not meant to impress anyone. It is there because architecture without named parts is usually theater. The point is the retrieval shape, not vendor worship.

The unit Sinapt has to return is not "a similar chunk." It is the right chunk, from the right source, under the right permission boundary, with enough surrounding context for an agent to act on it and enough audit trail for a company to explain why it was visible.

This is why retrieval quality cannot be deferred to "after MVP." For a normal SaaS product, the first version can have a rough search box. Users forgive it because search is an accessory. In Sinapt, retrieval is the work. The daily user is not a human browsing pages. The daily user is an agent deciding whether a design decision exists, what the current policy says, which customer constraint matters, whether a code change violates a prior decision, or what happened in the meeting nobody wrote up.

Bad retrieval does not feel like bad search. It feels like a smart agent making a bad decision with confidence.

Retrieval shape is the moat. The engine is the commodity. Sinapt's job is not to ship the world's fastest vector database. It is to ship the smallest production-grade retrieval contract — hybrid + reranker + contextual retrieval + ACL-aware filter + audit — and then defend that shape relentlessly. The vendors named above earn their slots; they are also replaceable in principle. The shape is not.

So the PoC does not get to pass because it can answer a few friendly demo questions. It has to run against a golden set from my own corpus, with QMD as the local baseline. If the cloud layer cannot match the local engine on the corpus that created Sinapt in the first place, the architecture is not ready to become a company product.

Permission-Aware Retrieval Is the Hard Part

The central invariant is simple:

You cannot leak a chunk the user was not allowed to read.

Everything else is implementation detail around that sentence.

The obvious failure mode in agentic RAG is to retrieve broadly, rerank broadly, and then filter late. That is wrong. If the reranker sees unauthorized chunks, the system has already failed. Maybe it does not return them. Maybe the final answer omits them. It still processed knowledge the caller was not allowed to access. In a real company, that is not a harmless internal detail. That is a permission leak with better UX.

Sinapt's retrieval path is locked around the opposite order.

Caller identity arrives. The request shows up with everything the policy compiler needs to know: user, workspace, tenant, which surface (MCP / REST / CLI / Web UI), requested scope, and whatever delegated authority applies to that request.
Aurora compiles the effective permission set. The control plane turns the caller's identity plus organization state plus source-level / collection-level / document-level ACLs into a single retrieval filter the engine can enforce. I'm not publishing the schema or compiler internals, but the responsibility is clear: produce one filter that captures every rule that applies to this caller right now.
The filter is applied before Turbopuffer retrieval. Not after. Before. Unauthorized chunks are not candidates. They do not enter the top-K pool. They do not become reranker inputs. They do not get summarized away into an answer that "probably" hides them. The search engine literally never touches a chunk the caller wasn't allowed to read.
Voyage reranking only sees authorized candidates. The reranker can be very good at ranking quality because the policy layer has already reduced the world to what this caller can actually read. Quality and authorization are the same path, not two tracks.
The request is audited. Not as a compliance afterthought, but as part of the control plane: who asked, through which surface, under which scope, which sources were eligible, which chunks were used, what answer path was taken. The audit log is queryable — both for the customer's own compliance review and for debugging "why did I see this / why did I not see this."

That is the Glean invariant worth copying: the permission filter sits before retrieval. Search quality and authorization are not independent tracks. They are one path.

This also changes how query rewrite works. A model can help expand a vague query into something retrievable. It can add synonyms. It can infer that "the thing we decided in the April planning call" probably means a meeting transcript, a Linear thread, and a planning doc. But rewrite cannot become a jailbreak against scope. Haiku's job in the locked design is contextualization and policy-gated rewrite, not free-form exploration of the whole tenant.

The architecture has to assume agents will ask broad questions. That is their job. "Prepare me for this customer call." "Find every prior decision about billing." "Tell me what I need before touching the auth service." Those are not neat search queries. They are work requests. The system has to translate them into retrieval without letting broad language become broad access.

Four Surfaces, One Authorization Spine

The second locked layer is the surface contract. This matters because I got closer to the trap while designing Sinapt than I want to admit.

The trap is treating MCP as the universal API because MCP is exciting and agents are the primary users. That is almost right, which makes it dangerous.

The better rule comes from the MCP protocol piece: caller lifetime decides the surface.

If the caller is a live chat-session agent, MCP is the right contract. The model needs tools, resources, prompts, and a protocol it can reason about inside the session. The locked production design uses Streamable HTTP (the modern remote transport — bi-directional streaming over HTTP, replacing the earlier HTTP+SSE MCP transport) over OAuth 2.1 (the standard for delegated user authorization — the agent acts on the user's behalf, with scoped access the user explicitly granted). Not stdio. Stdio (the local pipe transport) is fine for local development and adapter workflows but doesn't belong in production cloud SaaS.

For the PoC, the MCP catalog is intentionally small. The MCP protocol defines three primitives a server can expose to AI clients: tools (actions the agent can invoke — create issue, search, send message), resources (named bits of content the agent can read — a document, a chunk, your audit log), and prompts (reusable prompt templates the server suggests for common workflows). Sinapt's PoC ships four tools: sinapt_answer, sinapt_retrieve_context, sinapt_prepare_agent_brief, and sinapt_save_note. The MVP target is eight tools, six resources, and four prompts. There is a hard catalog budget because bad MCP servers become tool junk drawers — eighty-seven vaguely-named tools are worse than four sharp ones, because the model spends context window searching the menu instead of solving the problem.

If the caller is durable software, REST is the right contract. CI jobs, Lambda functions, web admin flows, integrations, scheduled workers, the CLI itself: these should not pretend to be chat agents. They need stable endpoints, idempotency (re-sending the same request with the same ID has the same effect — critical for retry-safe automation), SDKs auto-generated from OpenAPI (the machine-readable spec for REST APIs that lets every language generate a typed client from one source file), retry semantics, typed errors, and predictable schemas that survive across releases. That is not MCP's job.

This resolves the Linear / Claude Code trap. During a live Claude Code session, the agent can use Sinapt through MCP because the interaction is session-scoped. But code that runs after the chat ends cannot depend on an MCP conversation still existing. The durable integration uses REST.

The CLI consumes REST. sinapt mcp serve exists only as a local-development stdio proxy for tools that need that shape. It is not the production architecture.

The web UI owns admin, curation, billing, and audit. It is not the daily query path. This was already the direction in the previous Sinapt posts, but the surface lock makes it stricter. Humans need a place to inspect collections, review sources, invite teammates, configure billing, and understand why an answer cited what it cited. That does not make the web app the product. The product is the knowledge layer agents call.

The important part: all four surfaces share one Aurora policy compiler.

MCP does not get its own authorization logic. REST does not get a looser path because "it is internal." The CLI does not bypass policy because it feels local. The web admin does not query a side channel because it is convenient.

Surfaces differ in presentation; authorization is uniform.

That sentence is the architecture. The rest is plumbing.

The Loop, the Connectors, and the Open Merge Problem

Two things missing from the diagram above. Both matter.

Pluggable connectors, not a fixed list of six

The six connectors shown — Slack, Granola, GitHub, Linear, Notion / Drive, Filesystem / Gmail — are examples, not the full list. Layer 6 of Sinapt's architecture is the connector framework: a source / sink abstraction (each integration implements a contract), an event-handling pattern (webhook receivers vs polling workers vs scheduled jobs), and a workflow engine that orchestrates ingest jobs. New source types — Confluence, Zendesk, S3 buckets, custom internal databases, whatever — plug into the same shape: emit canonical markdown, declare ACL semantics, register on the event bus. The PoC starts with one trivial connector (filesystem watcher); the MVP adds GitHub + Slack + Linear; later phases add the rest. The architecture is not "Sinapt supports these six things." It is "Sinapt is the framework these six things plug into, plus the next sixty."

Ingest is a loop, not a pipeline

The diagram shows arrows flowing top-down — sources → S3 → chunker → embedder → Turbopuffer → reranker → authorized chunks. That is half the truth. The actual system is a cycle, not a pipeline.

Agents don't just read. They write. When an agent uses sinapt_save_note to capture a session discovery, when a curator uses sinapt_propose_doc_patch to update a stale design doc, when a summary of a retrieved context pack gets archived as new canonical content — every one of those operations becomes a new source in the very same ingest pipeline that produced the context it was generated from. Write goes through chunker → contextualizer → embedder → Turbopuffer. The next agent session retrieves the new write the same way it retrieves a Slack message. The yellow feedback loop on the left side of the diagram is that cycle.

This is what makes Sinapt a knowledge base, not a search engine. Search engines only read. Sinapt reads, writes, and re-reads what it wrote — with full provenance, full audit, full permission scoping, every loop.

And the open problem that creates: multi-writer merge

Here is the honest part. We do not have a merge story yet.

In code, multiple people working on the same file is a solved problem — git gives us branches, three-way merges, blame, pull requests, review queues. The discipline is decades old. In a canonical-markdown KB written to by both humans and agents, none of that infrastructure exists out of the box. Two agents in two sessions both decide to update the same architecture decision doc. A Slack reindex refreshes a thread-derived section while an agent rewrites the same section as policy guidance. An admin in the Web UI edits a heading while a CLI script is mid-patch. Last-writer-wins is acceptable for a cache; it is not acceptable for canonical knowledge.

The plan in three phases — honest about what is deferred:

PoC: keep canonical writes minimal. sinapt_save_note is append-only with full provenance (session, agent identity, timestamp, citations). Agents never overwrite canonical docs. Connector ingests update source mirrors only. The merge problem is avoided, not solved.

MVP: sinapt_propose_doc_patch becomes the primary write path. Every patch targets a base revision, a section anchor, and a source revision vector. Optimistic locking: if the base still matches when the patch arrives, apply it; if not, attempt a structured merge at the heading level; if the same claim changed both ways, surface a conflict for human review in the Web UI. Auto-apply only fresh, non-overlapping, low-risk patches.

Production: canonical markdown lives in Git underneath. Sinapt does not expose Git ceremony to agents — Git handles the version mechanics (commits, merge bases, branches, blame, reverts, conflicts) and Sinapt provides the agent-native review layer. Important docs have explicit owners. Trivial edits auto-merge. High-stakes content goes through human review.

What competitors do today: mostly punt. Glean's connectors fetch and index source content but writes go through the source's own conflict semantics (commits to GitHub via SHA-validated commits, page edits in Notion via Notion's own version model). Notion AI's agent can create and edit pages, but only inside Notion's existing system of record. Pinecone Assistant is a retrieval corpus with upload + replace + delete, not a collaborative canonical editing layer.

Nobody has a clean agent-native merge story yet. We are not pretending to. The build update is honest about that.

QMD Became the Teacher, Not the Engine

QMD deserves a clean correction because it shaped the original Sinapt thinking.

Locally, QMD still works. It is the reason my markdown knowledge base became usable by agents in the first place. It showed the daily behavior: short overview.md files, deeper markdown documents, hybrid search, agents querying context on demand instead of loading the world.

But QMD is no longer the Sinapt cloud engine.

Its role is now threefold.

Local markdown engine. QMD remains my personal retrieval engine for everything that lives on disk — a private markdown corpus — short overview files, deeper documents, and local-only notes. That stack is real, useful, and not going anywhere. QMD does pure BM25 (Floor 1 in the RAG framework) brilliantly on local markdown.
Evaluation oracle for the Phase 2 PoC. Sinapt's cloud retrieval has to match the QMD golden-set baseline on my own corpus. Not sentimentally — mechanically. Same corpus, same hard questions, scored against expected sources and acceptable answers. If the cloud layer can't beat the local engine on the corpus that created Sinapt in the first place, the architecture is not ready to become a company product.
Possible OSS lead magnet. QMD may become an open-source release later — a useful local tool, an honest bridge into the Sinapt worldview, no vendor lock-in story to fake. That decision is optional and later.

But the cloud engine is now the Turbopuffer + Voyage + Haiku + S3 + Aurora stack. That is the product path. QMD was the teacher.

Why This Is a Product, Not a Wrapper Around Five Vendors

📋

📋 The short version:

The vendors are commodity. Anyone can wire Turbopuffer + Voyage + Haiku + S3 + Aurora into a working RAG pipeline in a weekend. Notion's enterprise search already uses Turbopuffer underneath. The stack is not the moat.

The moat is operational, not algorithmic. Three things take years to get right: cross-source permission compilation (Slack ≠ GitHub ≠ Notion permissions, all merged into one query-time ACL filter), pre-retrieval ACL (not post-filter — the Glean invariant), and agent-shaped retrieval (10-100× human query frequency, action-loop traces, audit as product).

Sinapt wins where you are agent-first across mixed stacks. It loses to Glean for human enterprise search, to Microsoft inside M365, to off-the-shelf managed RAG for small single-source teams. The product is precisely the intersection.

Let's be honest about what's actually new here.

The individual components are commodity. Notion's enterprise search already runs on Turbopuffer underneath. Pinecone Assistant abstracts the same kind of stack into a single managed service. Glean shipped an MCP server in March 2026 that exposes permission-aware enterprise search to any AI host. The vendors are not the moat.

So what is?

The moat starts past the demo line — where every part of the stack still works but the integration breaks in ways that take quarters to fix. Three honest claims.

One: cross-source ACL propagation. A real company runs on a dozen permission models at once. Slack channel membership, GitHub org / team / repo grants, Notion page inheritance, Google Drive link sharing, Linear project roles, S3 bucket policies — these are different languages with different semantics. The moment a Slack message gets moved to a private channel, a GitHub repo gets archived, or a Notion page loses an editor, the retrieval layer has to know in seconds. Pinecone, Turbopuffer, and Voyage are explicit that this is the application's problem to solve — they give you filterable metadata, not an ACL system. Compiling every source's permissions into a unified query-time filter and keeping it fresh as ACLs change is engineering scar tissue, not a vendor SKU.

Two: pre-retrieval ACL, not post-filter. Most "secure RAG" tutorials retrieve broadly and filter the results before showing them to the user. That is both a security bug (unauthorized chunks reach the reranker, the prompt, the trace, the model context) and a quality bug (top-K collapses after the filter strips out half the results). Sinapt's invariant — the reranker never sees a chunk the caller isn't allowed to read — requires the filter compiled in Aurora and pushed into Turbopuffer's first-stage retrieval, not into a post-processing pass. Glean has the same invariant. Almost no DIY RAG stack does.

Three: agent-shaped retrieval, not enterprise search retrofitted. Humans run one search per session. Agents run 10 to 100. Recent estimates put roughly 85% of an agent's effort on knowledge retrieval rather than reasoning. The shape of the system has to assume that — tool catalog discipline (four tools at PoC, not eighty-seven), trace IDs that survive across retrieve → rerank → answer → action, audit records that answer "was this allowed, why, under which policy state, what evidence did the agent use before acting" — not just access logs. Eval (golden sets, recall regressions, reranker comparisons, abstention behavior) bundled in, not bolted on as a future enterprise feature.

The honest competitive landscape:

Glean (now with an MCP server, March 2026) — closest match. Permission-aware enterprise search with an agent surface. Glean's center of gravity is enterprise search; Sinapt's proposed center of gravity is agent-native retrieval. Sinapt wins where the agent is the primary user and the UI is the side surface, not the reverse.
Microsoft Copilot for M365 — dominant inside the Microsoft Graph with 100+ available connectors. If your company lives in M365, Microsoft already owns this layer. Sinapt is for mixed stacks — GitHub-heavy, Slack-native, multi-tool teams that don't accept Microsoft as the agent shell.
Pinecone Assistant + Pinecone Nexus — vector DBs moving up-stack. Pinecone Nexus reframes the future as a "compilation-stage knowledge layer" with a declarative query language for typed, governed answers — direct competition for generic RAG wrappers. Public positioning emphasizes governed knowledge for agents; it is not a collaborative canonical-markdown layer with cross-source ACL compilation.
Notion AI, Claude Skills, Cursor KB — strong inside the host app and its native connectors. Notion AI Connectors do reach Slack / Drive / Jira / Gmail / GitHub / Linear, so the "useless across stacks" framing is unfair. The actual gap: not a neutral canonical layer across the whole company stack, and not designed as a substrate other agent runtimes can build on. Skills package agent instructions; Sinapt is the substrate Skills can stand on.
DIY RAG — Gartner pegs the GenAI PoC-to-production abandonment rate at around 50%. The pattern: teams treat retrieval as a model call instead of an owned data product. Sinapt's bet is that owning the retrieval contract is more durable than owning the model.

The moat is operational, not algorithmic. It is made of integration scars, regression tests for permission bugs, connector freshness tickets, eval harnesses, and an audit trail that customers can replay during a security review. Strong for regulated, multi-source, agent-heavy teams. Weak for small teams with one document store and no ACL complexity — for them, off-the-shelf managed RAG is the right answer.

That is the honest sell. Sinapt is not a vector database with marketing. It is the data-and-policy product around the vector database — which is the part that actually takes years to get right.

The one-line answer

The moat is agent-first across multi-source data with permission rigor — three legs, all needed.

Pull any one leg out and Sinapt becomes a different product:

Without agent-first, you have Glean — built for human enterprise search; the tool catalog discipline, trace continuity across action loops, and audit-as-product all stop mattering when the primary user is a human typing into a search box.

• Without multi-source, you have Notion AI — excellent inside Notion's own data, useless the moment your real work also lives in Slack, GitHub, Linear, and Granola transcripts.

• Without permission-aware pre-retrieval, you have a generic RAG wrapper where permission safety is left to the application layer. The failure mode is not theoretical: the system will eventually return a restricted document to the wrong principal, and the failure surface scales with every connector added. Unsellable to anyone with a compliance posture.

Sinapt is the product where all three constraints had to be solved at once, in one architecture, before any of it becomes useful. That intersection is small enough that few have built it. It is also where most of the value lives in the agent era — because it is where retrieval stops being a model call and starts being a substrate.

Synthetic Tests Answer Quality. Dogfood Answers Economics.

The next trap is cost modeling.

Human-search RAG cost models are too optimistic for agent-first products. They assume a user asks a question, reads an answer, and maybe asks a follow-up. That is not how agents behave when the knowledge layer is good.

Agents call memory as part of work loops. They retrieve before editing. They retrieve before planning. They retrieve before answering. They retrieve during verification. They retrieve again when the first answer exposes a second question. A human might run ten searches in a day. An agentic workflow can run that in one task.

So the synthetic spike and dogfood have different jobs.

The three-day synthetic spike answers quality and safety:

Top-5 retrieval at or above 80% on the golden set. For at least 80 out of 100 hand-graded questions, the right source document has to appear in the top-5 chunks returned to the agent. Below that bar, the agent will hallucinate even with a perfect model.
Zero unauthorized chunks reaching the reranker or the agent. Even one is a fail. The whole point of pre-retrieval ACL filtering is that zero is a binary success metric, not a percentage to nudge upward over time.
p50 query latency under 500ms end-to-end. Including network, Aurora policy compilation, Turbopuffer hybrid query, RRF, Voyage rerank, and response serialization. Agents call retrieval frequently per session; if the median round-trip is half a second, the workflow stays fluid. If it's two seconds, agents stop calling.

Those gates are intentionally boring. If the system cannot return the right context quickly and respect permissions absolutely, no pricing model matters.

The five-day dogfood answers economics. Not with public cost numbers. Not with launch pricing. Not with a spreadsheet dressed up as conviction. Pricing stays in exploring mode until real usage patterns exist.

The question is call frequency. How often does a useful agent actually hit Sinapt when there is no local fallback and the cloud layer is the source of truth? What does one day of real work cost when the agent is allowed to use memory naturally? Where does caching help? Where does reranking dominate? Which query classes need answer synthesis and which only need context retrieval? How much does policy compilation cost under repeated loops?

That is what dogfood is for.

For agent-first products, cost is not a human-search spreadsheet with a new logo. The economics model has to come from actual agent call frequency, not from a human-search baseline that systematically under-predicts.

Phase Shape From Here

The phase shape is still the same, but now with sharper boundaries.

Phase 1 is architecture lock-in. That is the current phase. Layer 2, the search engine, is locked. Layer 5, the contract surfaces, is locked. Other layers still need the same treatment before the PoC begins. I am not trying to hide unfinished work behind confident prose. Locked means locked. Pending means pending.

Phase 2 is the PoC spike and dogfood. Single user. My corpus. Zero local fallback. The point is to force the cloud layer to stand on its own. If it quietly leans on QMD, the test is meaningless. The PoC either retrieves the right context, respects permissions, fits the latency envelope, and produces real usage economics, or the architecture changes.

Phase 3 is MVP. Multi-tenant. Stripe. First connectors. Web admin. The contract surfaces become more complete. MCP expands to the small MVP catalog. REST gets the SDK path. The policy compiler stops being a single-user control plane and becomes company infrastructure.

Phase 4 is public launch.

That is deliberately restrained. The first three Sinapt posts made the case for why the product should exist. This one is about whether the architecture is becoming serious enough to deserve implementation.

The answer is partially yes. Retrieval is locked. Surface contracts are locked. The permission invariant is locked. QMD's role is clarified. The cost question has been reframed around agent frequency instead of human search.

But the system has not earned louder language yet.

This is research in progress. The next step is not a launch, not a manifesto, not a victory lap. It is the spike, then dogfood, with the local baseline removed and the architecture forced to answer the only question that matters:

Can Sinapt make a company's knowledge retrievable by agents without leaking what those agents should not see?

📚

Related Reading

Sinapt and the Queryable Company. The market frame this build update sits inside. The knowledge tax, the 5 properties, why this product exists. → Read

RAG, From Crayons to PhD. The five-floor RAG pedagogy. Floor 4 is the 2026 production baseline that Sinapt's cloud retrieval layer implements. → Read

MCP, From Pidgin to Protocol. The MCP vs REST caller-lifetime rule applied directly to Sinapt's four surfaces. → Read

Claude Killed the API Key. Workload identity federation — the auth pattern at Sinapt's API gateway. → Read

Claude for Legal Is Claude Code Wearing a Suit. Anthropic's May 12 vertical launch is the Claude Code stack with a legal skin. claude-for-legal

💬

Working with a team that wants to adopt AI-native workflows at scale? I help engineering teams build this capability — workflow design, knowledge architecture, team training, and embedded engineering. → AI-Native Engineering Consulting

Sinapt: The Architecture Before the Code

Retrieval Quality Is Not an MVP Upgrade

Permission-Aware Retrieval Is the Hard Part

Four Surfaces, One Authorization Spine

The Loop, the Connectors, and the Open Merge Problem

Pluggable connectors, not a fixed list of six

Ingest is a loop, not a pipeline

And the open problem that creates: multi-writer merge

QMD Became the Teacher, Not the Engine

Why This Is a Product, Not a Wrapper Around Five Vendors

The one-line answer

Synthetic Tests Answer Quality. Dogfood Answers Economics.

Phase Shape From Here

Read more

Saludos desde el paraíso

Opus 4.8 Would Rather Tell You It Failed

La Herida Que No Cierra

The Abdomen Said No

Retrieval Quality Is Not an MVP Upgrade

Permission-Aware Retrieval Is the Hard Part

Four Surfaces, One Authorization Spine

The Loop, the Connectors, and the Open Merge Problem

Pluggable connectors, not a fixed list of six

Ingest is a loop, not a pipeline

And the open problem that creates: multi-writer merge

QMD Became the Teacher, Not the Engine

Why This Is a Product, Not a Wrapper Around Five Vendors

The one-line answer

Synthetic Tests Answer Quality. Dogfood Answers Economics.

Phase Shape From Here

Sign up for Vanja Petreski

Read more

Saludos desde el paraíso

Opus 4.8 Would Rather Tell You It Failed

La Herida Que No Cierra

The Abdomen Said No