Abstract
Policy and political work is knowledge work under pressure. It runs on cables, reports, transcripts, statutes, and briefs that an institution accumulates faster than any one analyst can read. The work breaks at three places: when context fragments across tools, when reasoning fails to inherit from the analyst who leaves to the desk that stays, and when the AI on top hallucinates the structure rather than reading it. We propose an AI knowledge layer as the fix: a typed, source-bound, time-ordered graph that turns institutional reading into reasoning the next analyst, the next desk, and the next model can inherit. The layer is built on a kernel ontology with dynamic extensions — eight universal primitives that do not change, plus per-case subclasses learned from the work and validated against the kernel. Inference runs through Ontology-Augmented Generation (OAG): typed, grounded in source spans, contested by design, temporally honest. Conflict reasoning is the first specialization we measure (TCGC) because it stresses every property of the layer at once. Policy options memos, stakeholder analysis, regulatory contestation, and mediation work are next. We invite the field to fork and improve everything.
1. The problem
Ask any policy desk officer, mission analyst, regulatory lawyer, mediator, or party staffer what they actually do all day, and the answer rhymes: read a lot of text, figure out who said what to whom and when, reconcile conflicting accounts, track commitments across time, and produce a structured artifact — a memo, a briefing, a SITREP, an options paper, a compliance review — that an institution can act on.
The structure is the product. The structure is also invisible. It lives entirely in the practitioner’s head, and when she leaves the institution it leaves with her. Almost everything she uses to do this work — email, spreadsheets, word processors, generic AI chat — was built for a different job.
The asymmetry shows up at three scales simultaneously: between the parties to a policy or political question, inside the institution that has to decide, and across the AI tools the institution uses to think. Generic chat-on-top-of-PDFs collapses all three into the same fluent surface, which is why senior practitioners read it once, find it impressive, and then return to their notes.
2. Inheritance as architecture
Institutional knowledge work has a specific failure mode that consumer software does not have to think about: handoff. The analyst who built the file rotates out. The desk officer who held the dossier moves on. The mediator who ran the last round retires. What gets inherited, today, is a folder of PDFs and a few decks. What needed to be inherited is the reasoning — which actor owns which interest, which commitment is still active, which event triggered which policy shift, which narrative stopped being credible last March.
Generic AI assistants do not solve this. They store text and generate text. They do not maintain typed structure across sessions. The 2025 work on context engineering [24]and Letta’s Context-Bench [25] document the same failure mode formally: as agents accumulate self-generated context, accuracy decays in ways that look fluent and read confident. An institution that stores its reasoning in a chat transcript is paying that bill twice — once on the way in, once on the way back out.
The knowledge layer treats inheritance as a first-class architectural property. Every claim is a typed object with a source span. Every commitment is bi-temporal. Every extension a particular case taught the system is logged, named, and reviewable. The graph survives the analyst. The next desk reads the graph, not the rumor of it. Adjacent work in temporally evolving knowledge graphs (MedKGent 2025 [18], Graphiti / Neo4j 2025 [26]) builds the same shape in medicine and agentic memory; the policy and political domain has not had its version of this yet. We are building it.
This is the move that distinguishes a knowledge layer from a chatbot. A chatbot is a query interface. A knowledge layer is the institution’s typed reasoning, persisted, contestable, and inheritable.
3. What the literature already knows
Three bodies of literature converge on the same finding: institutional reasoning is structured, not noise. Conflict theory — Fisher and Ury [1], Glasl [2], Galtung [3], Axelrod [4] — names actors, claims, interests, positions, escalation, and cooperation as recurring components of any dispute. Moral-foundations work from Haidt [5] and value-conflict research from Sunstein [6] give us the frames people argue inside.
Public administration and policy-process research adds the institutional layer: the same primitives that organize a dispute also organize a policy file — stakeholders, claims of evidence, regulatory constraints, leverage, commitments to deadlines, events on the calendar, contested narratives. The Kingdon multiple-streams view, the advocacy-coalition framework, and the punctuated-equilibrium tradition all rely on these objects without typing them.
The literature converges on a finding that nobody has yet operationalized: policy and political reasoning is a typed structure. The same primitives appear in workplace mediation, commercial arbitration, multilateral negotiation, legislative analysis, regulatory contestation, and intergenerational family disputes. The grammar travels. Until now, it has lived in the analyst’s head and left the institution when she did.
4. Kernel + dynamic extensions
The closed-ontology failure mode. SNOMED, FIBO, LKIF, the Gene Ontology — extraordinary in their domains, rigid by design, unable to extend without committee. Policy and political work is a dirtier domain. A Commitment in HR mediation, a ceasefire in a peace process, a vendor SLA, a regulatory pledge, and a campaign promise are all Commitments — and they have entirely different shapes. A fixed ontology that flattens those into one schema either over-specializes (loses domain transfer) or under-specializes (loses the case). Bian (Oct 2025) [19] names schema rigidity as one of the three enduring challenges in classical knowledge-graph construction.
The schema-free failure mode. Pure agentic graph builders without a backbone — see the agentic-deep-graph-reasoning literature [17] — produce locally coherent but globally inconsistent graphs. Two analysts looking at the same file end up with two ontologies. Provenance survives, interoperability dies. Inheritance becomes impossible.
The kernel-with-extensions third way.Eight primitives at the core (actor, claim, interest, constraint, leverage, commitment, event, narrative). Universal across cases. Versioned. Forkable. Below the kernel: domain-specific subclasses learned per case (HR-Commitment, Ceasefire-Commitment, SLA-Commitment, RegulatoryCommitment, CampaignCommitment), validated against the kernel’s invariants, logged with provenance, reviewable by the practitioner. The LLMs4OL 2025 challenge on automated ontology induction [20], OntoEKG (Feb 2026) [22], and MedKGent (2025) [18] all point in this direction.
Why this is the right bet for policy and political work specifically. Policy and political work is the highest-stakes knowledge domain that has no operationalized ontology. Medicine has SNOMED. Law has LKIF. Finance has FIBO. Policy and political work — the domain that runs on prose, memory, and hand-written briefings — has none. Building this kernel openly, in 2026, is the most useful thing TACITUS Research can do.
THE EIGHT PRIMITIVES
Actor (who participates), Claim (what is asserted), Interest (what lies beneath), Constraint (what limits outcomes), Leverage (what holds the power), Commitment (what was agreed), Event (what happened when), Narrative (what frames it). Read the live grammar with worked examples — including a single Commitment rendered in three domains — at /research/grammar.
5. Bi-temporal grounding
Valid time vs. transaction time.A commitment was made on Monday (valid time). The desk learned about it on Friday (transaction time). On Saturday, one party retroactively claims the commitment was actually a “tentative idea” — a new claim, made now, about the past. A static graph cannot represent this. A bi-temporal graph can — and must — for the data to be honest.
Why this is structurally different from event sourcing. Event sourcing tracks system state. Bi-temporal grounding tracks truth state across time. MedKGent (2025) [18] on temporally evolving medical knowledge graphs and the Graphiti / Neo4j 2025 architecture [26] articulate the same shape in adjacent domains. Policy and political work is, if anything, more demanding: positions shift, commitments are renegotiated, events are reinterpreted, narratives drift. A knowledge layer that pretends Monday was Tuesday is not legible.
The minimum viable bi-temporal claim. Every claim in the TACITUS graph carries: source span, asserter, valid-time interval, transaction-time interval, contestation status. A claim can be true-then-revised, true-then-contested, asserted-then-disclaimed. None of those is an error; all of those are the data the next analyst needs to inherit.
6. Ontology-Augmented Generation
Definition. Ontology-Augmented Generation (OAG) is a generation pattern in which a language model produces output that is (i) typed — every produced object is an instance of a kernel primitive or a validated extension; (ii) grounded — every claim cites the source span it was extracted from; (iii) contested — counter-claims and contradictions are first-class objects in the graph the LLM consults, not noise to be averaged away; (iv) temporally honest — every claim carries valid-time and transaction-time stamps the LLM is required to respect.
| Pattern | Schema | Provenance | Contestation | Temporality |
|---|---|---|---|---|
| RAG | None | Document-level | Averaged | Flattened |
| GraphRAG | Auto-induced, flat | Entity-level | Averaged | Mostly flattened |
| OG-RAG | Domain ontology, fixed | Entity-level | Not modeled | Not modeled |
| Think-on-Graph | Domain KG, fixed | Path-level | Not modeled | Not modeled |
| OAG (TACITUS) | Kernel + dynamic extensions | Span-level, bi-temporal | First-class objects | Bi-temporal |
In order: Lewis et al. 2020 on RAG, Edge et al. 2024 on GraphRAG [12], Sharma et al. 2025 on OG-RAG [15], Sun et al. 2024 on Think-on-Graph [16], and Hou et al. 2024 on WikiContradict [13].
Worked example.A regulatory consultation produces a Commitment from the agency on Monday: “the rule will be redrafted before Q3.” A trade body acknowledges ambiguously the same morning. On Thursday, the agency spokesperson denies scope: “the redraft is exploratory.” OAG types each speech act against the kernel, stamps the bi-temporal intervals, and creates a contestation edge between the original Commitment and the later denial — instead of averaging the three turns into a smoothed paraphrase. The structure is what the desk inherits.
APPENDIX · PRODUCTS AS DEMONSTRATIONS
PRAXIS is the AI analyst workbench that surfaces the knowledge layer to professionals — desks, missions, mediation teams, policy units. DIALECTICA is the engine. Wind Tunnel and CONCORDIA are side projects that reuse the same backbone for reception modeling and live deliberation. Every product reads and writes the same graph. The point of the products is to show the layer works in production, not to be the research artifact.
7. Contestability as architecture
Most ontologies assume an authoritative voice.Medical: there is one truth about a diagnosis. Legal: a contract has one binding interpretation (eventually). Telecom: a network configuration is or isn’t valid.
Policy and political work has no authoritative voice by design. The domain is the disagreement. Forcing a resolution at the schema layer is a category error.
Holding disagreement structurally. The TACITUS graph carries: claim and counter-claim as paired objects; evidence and counter-evidence as typed edges; commitment and broken-commitment as bi-temporal events; narrative drift as a primitive in its own right. WikiContradict (Hou et al., 2024) [13] and ConflictBank [14] are the benchmarks that started naming this gap.
Why this matters for the practitioner. The system does not tell the desk officer, mediator, or analyst what is true. It shows them the disagreement shaped. The decision is human. The structure is the product.
8. Why LLMs alone cannot do this
Three structural failures, all architectural rather than training-data problems. Temporality is flattened: transformers see sequences, not timelines, and cannot reliably reconstruct when a commitment was made, broken, or renegotiated (Chen et al. 2023 [9]). Causality collapses to co-occurrence: multi-hop inferences of the form “A caused B via mechanism M under condition C” exceed what attention can reliably construct (Kıcıman et al. 2023 [10]). Provenance is absent by construction: every claim a generic LLM produces is, architecturally, an uncited assertion (Ji et al. 2023 [11]); RAG reduces but does not eliminate the problem because the generation step is still the same transformer.
Two newer failure modes round out the diagnosis. WikiContradict (Hou et al. 2024) [13]shows LLMs systematically mishandle conflicting evidence unless contradiction is represented explicitly. Letta’s Context-Bench (2025) [25] documents cascading agent failure under self-generated context: as agents accumulate their own outputs, accuracy decays in ways that look fluent and read confident. This is the failure mode that punishes institutions hardest — because handoff, the moment when one analyst hands the file to the next, is exactly when self-generated context is highest and ground truth is least available.
By 2026 these failures are not contested. The interesting question is no longer whether to add structure, but what kind, who maintains it, and how it inherits.
9. Specializations
The knowledge layer is a single architecture. Specializations are the per-domain extension stacks that ride on top of it. We work on each specialization in public, with its own benchmark and its own surface in PRAXIS.
- Conflict reasoning. The first specialization, because it stresses everything: time, causality, provenance, commitment tracking, interest/position separation, narrative drift, cross-actor contradiction. Measured by TCGC. v0.2 introduces dynamic-ontology task types: schema-extension induction, kernel-invariant validation, cross-domain primitive transfer.
- Policy options analysis. Stakeholder map, interest landscape, constraint inventory, leverage assessment, options memo. Same kernel, different extensions: PolicyOption, RegulatoryConstraint, JurisdictionalLeverage. Specialization benchmark in design.
- Mediation and ADR. Live deliberation, commitment tracking, narrative drift, agreement drafting. Surfaces in CONCORDIA. Builds on the same kernel; extension stack is HR-Commitment, AgreementClause, FacilitationEvent, RoomNarrative.
- Regulatory contestation. Consultations, rulemakings, legal challenges. Kernel + extensions for RegulatoryAct, StakeholderResponse, JudicialReviewEvent.
Each specialization is a public artifact: extension schema, worked examples, benchmark methodology where it exists, an open issue tracker. We do not promise every specialization at once. We promise the kernel underneath them is the same one, and that the work shows in public.
10. Risks
Misuse for adversarial analysis.The same engine that helps a mediator structure a workplace dispute helps a corporate adversary map a counterparty’s leverage; the same engine that helps a policy desk assemble an options memo helps a campaign assemble an opposition file. We limit early deployments to partners with stated, reviewable use cases. We do not provide the engine to actors whose declared purpose is one-sided coercion.
Institutional capture. The map can become a mechanism for whoever controls the platform to win. We mitigate by publishing the kernel, by logging every dynamic extension with provenance, and by keeping the graph forkable. Capture survives openness only if nobody else picks up the fork.
Schema drift. Per-case extensions, by design, can drift. A Commitment in HR mediation should not silently become a Commitment in a regulatory consultation. We address this with kernel invariants the dynamic extensions must validate against, and with a public review surface for proposed extensions. We do not pretend this problem is solved.
Training-data leakage. Policy and political data is, by definition, sensitive. We do not train on partner-provided graphs. The reference engine is run on-prem or in a partner-controlled tenant. We publish methodology, not cases.
11. Open science as architecture
Four commitments keep the scientific layer honest: the kernel is public and forkable; the dynamic-extension log is reviewable; the pipeline is MIT-licensed; the benchmark is a shared artifact, not a competitive moat. A company that makes claims about institutional reasoning and keeps them behind closed doors does not deserve the benefit of the doubt.
12. Invitations
This paper is an invitation. Four invitations, specifically:
- Read the kernel. Disagree in public.
- Open PRAXIS or DIALECTICA on a file you know well — a peace process, a rulemaking, a mediation, an HR investigation — and tell us where the dynamic extensions break.
- Propose a new TCGC task type, or propose a new specialization benchmark (policy options, regulatory contestation, ADR).
- If you run a pilot, write up what worked and what did not, openly.
Institutional reasoning is structured. Until now, the structure was invisible and uninheritable. We are making it visible — and inheritable — in the open.
References
Foundations
- [1]Fisher, R., Ury, W., Patton, B. (1981). Getting to Yes: Negotiating Agreement Without Giving In. Houghton Mifflin.
- [2]Glasl, F. (1999). Confronting Conflict: A First-Aid Kit for Handling Conflict. Hawthorn Press.
- [3]Galtung, J. (1990). Cultural Violence. Journal of Peace Research, 27(3), 291–305.
- [4]Axelrod, R. (1984). The Evolution of Cooperation. Basic Books.
- [5]Haidt, J. (2012). The Righteous Mind: Why Good People Are Divided by Politics and Religion. Pantheon.
- [6]Sunstein, C. R. (1994). Incommensurability and Valuation in Law. Michigan Law Review, 92(4), 779–861.
- [7]Hogan, A. et al. (2021). Knowledge Graphs. ACM Computing Surveys, 54(4).
- [8]Garcez, A. d'A., Lamb, L. C. (2023). Neurosymbolic AI: The 3rd Wave. Artificial Intelligence Review, 56(11).
- [9]Chen, W. et al. (2023). Benchmarking Large Language Models on Temporal Reasoning. arXiv:2306.08952.
- [10]Kıcıman, E. et al. (2023). Causal Reasoning and Large Language Models. arXiv:2305.00050.
- [11]Ji, Z. et al. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12).
2025–2026 frontier
- [12]Edge, D. et al. (2024). From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv:2404.16130.
- [13]Hou, Y. et al. (2024). WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts. NeurIPS 2024 Datasets & Benchmarks Track. arXiv:2406.13805.
- [14]Su, Z. et al. (2024). ConflictBank: A Benchmark for Evaluating the Robustness of LLMs to Knowledge Conflicts.
- [15]Sharma, K., Kumar, P., Li, Y. (2025). OG-RAG: Ontology-Grounded Retrieval-Augmented Generation. EMNLP 2025. ACL Anthology 2025.emnlp-main.1674. arXiv:2412.15235.
- [16]Sun, J. et al. (2024). Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph. ICLR 2024.
- [17]Buehler, M. (2025). Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks. arXiv:2502.13025.
- [18]Zhang, D. et al. (2025). MedKGent: A Large Language Model Agent Framework for Constructing Temporally Evolving Medical Knowledge Graphs. arXiv:2508.12393.
- [19]Bian, H. (2025). LLM-Empowered Knowledge Graph Construction: A Survey. arXiv:2510.20345.
- [20]Beliaeva, A., Rahmatullaev, T. (2025). Heterogeneous LLM Methods for Ontology Learning. LLMs4OL 2025 Challenge, ISWC 2025. arXiv:2508.19428.
- [21]Zhang, X. et al. (2025). OntoURL: A Benchmark for Evaluating LLMs on Symbolic Ontological Understanding, Reasoning and Learning. arXiv:2505.11031.
- [22]Author(s) (2026). OntoEKG: LLM-Driven Ontology Construction for Enterprise Knowledge Graphs. arXiv:2602.01276.
- [23]Author(s) (2026). OntoKG: Ontology-Oriented Knowledge Graph Construction with Intrinsic-Relational Routing. arXiv:2604.02618.
- [24]Anthropic (2025). Effective Context Engineering. Anthropic engineering blog.
- [25]Letta (2025). Context-Bench: Benchmarking LLMs on Agentic Context Engineering. letta.com/blog/context-bench.
- [26]Neo4j (2025). Graphiti: Knowledge Graph Memory for an Agentic World. neo4j.com/blog/developer/graphiti-knowledge-graph-memory.
- [27]ACLED. Conflict categories and codebook methodology.
- [28]United Nations and World Bank (2018). Pathways for Peace: Inclusive Approaches to Preventing Violent Conflict.
- [29]NIST (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0).
- [30]OECD (2025). Governing with Artificial Intelligence: Are governments ready?
Cite this
@techreport{tacitus2026knowledgelayer,
title = {The AI Knowledge Layer for Policy and Political Work: Kernel Ontology, Dynamic Extensions, and Ontology-Augmented Generation},
author = {{TACITUS Research}},
year = {2026},
note = {Version 1.1},
url = {https://www.tacitus.me/research/vision}
}