TL;DR
- 01Policy and political work is knowledge work over typed structure. The dominant stack — LLM + RAG over unstructured text — fails on five measurable axes: temporal reasoning, causal precision, span-level provenance, contradiction handling, and long-horizon context.
- 02A larger model does not patch any of these. They are architectural, not training-data, failures.
- 03TACITUS builds the substrate the stack is missing: a kernel ontology, bi-temporal grounding, Ontology-Augmented Generation, and a seven-graph engine that holds typed reasoning across analysts, desks, and electoral cycles.
- 04We are a technology company, not a think tank. The product is the infrastructure.
Policy and political work is knowledge work over typed structure. The dominant 2024–2026 stack fails on five measurable axes — time, causality, provenance, contradiction, long-horizon context. A larger model patches none of them. Here is what the corrective looks like.
The thing that is missing
Analysts produce structured outputs (memos, options papers, briefings, agreements, compliance reviews) from unstructured inputs (cables, transcripts, statutes, reports). The structure they produce — who claimed what, when, against which constraint, with what leverage, under which commitment, in which framing — is the load-bearing artefact. In current institutional practice it is re-derived per analyst per case from prose representations. The institution does not persist the typed reasoning. It persists the prose.
Five concrete failures
- ▸F1 · Temporal flattening. Events become an ordered list; valid-time vs transaction-time collapses into a single descriptive sentence.
- ▸F2 · Causal collapse. Multi-hop inferences (A caused B via mechanism M under condition C) reduce to co-occurrence or simple succession.
- ▸F3 · Provenance absence. RAG reduces hallucination but operates at document granularity; the span a claim was extracted from is not preserved through generation.
- ▸F4 · Contradiction averaging. Conflicting passages get smoothed into a single confident paraphrase. WikiContradict and ConflictBank measured this across the frontier model families.
- ▸F5 · Context decay under self-generation. As agentic systems accumulate their own outputs, accuracy decays in ways that look fluent. Letta's Context-Bench puts the ceiling at ~74% on contamination-proof multi-step tasks for strong 2025–2026 frontier models.
The corrective
Put typed structure outside the model. Bind every generated claim to a source span. Represent contradictions as first-class objects. Enforce temporality at the data layer rather than the prompt layer. The neural component does what it does well — fluent extraction over noisy text — under hard validation against a typed graph it cannot drift away from.
That is the TACITUS bet. Engine: DIALECTICA. Ontology: ACO. Pattern: OAG. Benchmark: TCGC. Open source: kernel + pipeline. We publish methodology, ship infrastructure, and stay out of the editorial business.
SOURCES
- [1]Hou et al. (2024). WikiContradict. NeurIPS D&B.
- [2]Su et al. (2024). ConflictBank. NeurIPS D&B.
- [3]Letta (2025). Context-Bench.Multi-step contamination-proof agentic tasks.
- [4]Anthropic (2025). Effective context engineering for AI agents.
- [5]Sharma et al. (2025). OG-RAG. EMNLP.
- [6]Edge et al. (2024). GraphRAG.