TL;DR
- 01Frontier models made generation cheap. For institutions whose documents must survive scrutiny, the bottleneck moved: it is no longer producing the draft, it is trusting it.
- 02Independent evaluation backs this up — IFIT found every major LLM offering confident, dangerous conflict-resolution advice without due diligence. Citation features do not fix it: citing is not verifying.
- 03Verified context means the context itself carries the audit: source receipts, trust tiers, two clocks, human review decisions, and a runtime contract the consuming agent must obey.
- 04This property appreciates as models improve. Better generation makes unverifiable output cheaper and more abundant — and verified context correspondingly more valuable.
Frontier models made drafting cheap; checking stayed expensive. Citation is not verification — institutions run on warrant. What verified context looks like as a data structure, and why the property appreciates as models improve.
The bottleneck moved
Between 2023 and 2026 the cost of producing a fluent, well-structured draft collapsed. Nothing comparable happened to the cost of checking one. For a policy desk, a legal team, or an analysis unit, every AI-produced paragraph still has to be verified by the same scarce expert it was meant to relieve — and verification at document granularity is slow, judgment-heavy work. The arithmetic is unforgiving: if checking costs what writing used to, the model has not saved the institution anything.
The industry’s answer has been citation features. They help, but they conflate two different acts. A citation says “this text relates to that source.” Verification says “a competent person examined this claim against that source, judged the support, recorded the judgment, and is named.” The first is retrieval; the second is warrant. Institutions run on warrant.
What verification looks like as a data structure
- ▸Receipts, not links. A claim binds to a source span — document, character range, content hash — so “check this” is a constant-time act, not a search.
- ▸Trust tiers with consequences. T1 vetted, T2 corroborated, T3 needs corroboration — and a runtime rule per tier: assert, attribute, hedge. The tier travels into generation.
- ▸Two clocks. What happened (valid time) and when it became known (observation time) are separate fields. “What did we know on the 14th?” becomes a query.
- ▸Named review. Approve, approve-with-caveats, reject — per record, with the reviewer’s role attached, kept inside the object forever.
- ▸A contract on use. must_cite, must_surface_disputed, citation style — the capsule instructs the agent, not the other way round.
Why this thesis strengthens with model progress
Every capability the labs ship makes generation cheaper and more confident — including confidently wrong generation. The supply of plausible text grows without bound; the supply of warranted knowledge grows only as fast as experts can review. Whatever makes expert review cheap, portable, and reusable is therefore the appreciating asset in the stack. That is the design brief Context Capsules answer: audit the context once, inherit the audit in every output built on it.
Bessemer’s 2025 State of AI named “memory and context” the new moats. We would sharpen it: unverified memory is a liability with good recall. The moat is verified context.
SOURCES
- [1]IFIT (2025). Initiative on AI and Conflict Resolution — LLM risk findings.
- [2]Bessemer Venture Partners (2025). The State of AI.
- [3]Content Authenticity Initiative (2026). The State of Content Authenticity.The C2PA precedent: markets adopt cryptographic provenance — for media. Knowledge objects are the open gap.