Loading TACITUS
Loading TACITUS
Evidence
TCGC is our open benchmark for grounded reasoning over typed knowledge — citation fidelity, temporal honesty, and structured extraction, in the domain where getting it wrong costs the most. It is the measuring stick we intend the field to use, which is why it is open, versioned, and reproducible.
Current state · 2026-06
Typed extraction F1
Did the system produce the right primitives with the right classes — actors, claims, commitments, events — not just fluent prose about them?
Edge fidelity
Are the relations typed and correct? “Rejects”, “commits-to”, and “contradicts” are different edges with different consequences.
Span fidelity
Does each cited span actually contain the claim attributed to it? Catches the failure citation features hide: confident text pointing at the wrong receipt.
Temporal correctness
Both clocks present and right; interval relations between episodes consistent; superseded claims chained, not erased.
Contradiction preservation
Conflicting claims must survive as an explicit typed edge. Smoothing a dispute into one confident sentence scores zero on the item.
Why these five
They are the disciplines a Context Capsule enforces — so the benchmark measures exactly the gap between fluent AI and checkable AI, on the hardest available material.
git clone https://github.com/sargonxg/TCGC_TACITUS-Conflict-Grammar-Corpus_BENCHMARK tcgc cd tcgc # corpus items: items/v0.1-sample/ # task schema: schema/tcgc-v0.1.json # run your system over an item, emit the typed subgraph, score per the schema
If a published score on this page cannot be reproduced from the repository, treat it as wrong and tell us: hello@tacitus.me.
A number appears on this page only when it comes from a published, reproducible run. Until then, the state is “in progress” and says so.
The first published scores will be frontier-model baselines on the open corpus — including where they do well. Our own submission lands when the eval harness is complete.
Corpus, task definitions, and scoring all live in the open repository. If you can’t re-run it, we don’t cite it.