TL;DR
- 01Three architectural failures: time, causality, provenance. Each is a property of transformer architecture, not a tuning problem.
- 02RAG mitigates none of them at the reasoning step; it only changes what the model is shown.
- 03The fix is neurosymbolic — typed graph next to the model, with the model required to consult it.
Three structural failures — time, causality, provenance — each a property of transformer architecture, not a tuning problem. RAG fixes none of them at the reasoning step.
Temporality
A transformer reads a sequence; it does not see a timeline. Ask "when did Party A commit to X?" and the answer is guessed from local context, not reconstructed from a model of the timeline. Strip explicit dates from your prompt and the "reasoning" collapses (Chen et al., 2023).
Causality
LLMs are statistical associators. "A and B appeared together in pre-training" is what they know. "A caused B via mechanism M under condition C" is a multi-hop inference with typed edges, and there is no structure in the model that can represent it. Chain-of-thought helps on the surface; the kind of causal chain a mediator cares about requires maintaining and updating a graph (Kıcıman et al., 2023).
Provenance
Every claim a generic LLM emits is, architecturally, an uncited assertion. The model cannot point at the span it came from; it cannot distinguish what it was told from what it interpolated. RAG gives you relevant chunks; reasoning over those chunks still happens inside the same architecture that flattens time, collapses causality, and invents sources.
The neurosymbolic move
Put an explicit structure — graph, typed edges, provenance — next to the model and make the model consult it. That is the bet: language models for what they do well (fluency, extraction, generation), graph reasoning for what they do badly (time, causality, sourcing). Each covers the other's blind spots.
SOURCES