TL;DR
- 01OSINT in 2026 is no longer a tooling problem; it is a *grounding* problem. Models can summarise everything; they cannot tell you which source is load-bearing.
- 02The structural moves are: cite at the span level, attribute every claim to a source class, model source reliability as a typed property rather than a vibe, and treat contradiction as data.
- 03Existing public datasets — ACLED, GDELT 2.0, UCDP, ReliefWeb, OCHA HDX, Wikidata, OpenStreetMap conflict layers — have specific failure modes the analyst must model explicitly.
- 04The TACITUS approach: ARGUS does ingestion + provenance binding; AGON classifies the claims; KAIROS holds the timeline; the graph holds the contradictions.
Models can summarise everything. They cannot tell you which source is load-bearing. The structural moves: cite at the span level, attribute every claim to a source class, model reliability as a typed property, treat contradiction as data.
What changed in 2025–2026
- ▸Long-context frontier models (1M+ tokens) make naive summarisation cheap and naive sourcing cheaper. The bottleneck moved up: which sources to trust, in which combinations, with which biases — that is the analyst's job, and the tooling has to scaffold it.
- ▸Synthetic-content detection became unreliable enough that "is this image AI-generated" is no longer a yes/no answer; it is a posterior over generation processes. Treat it that way in the schema.
- ▸Geographic OSINT is constrained by the open-imagery cadence (Sentinel, Planet). Real-time claims need temporal triangulation across modalities.
- ▸Language coverage outside English remains uneven. ACLED, GDELT, and UCDP each over- or under-cover specific regions; treat coverage as a typed property of the source.
Practical patterns we use
- ▸Every Claim is bound to a
source_span: (doc_id, char_start, char_end). Document-level citations are insufficient. - ▸
AttributionClaim.confidence_tier ∈ {low, moderate, high}rather than a free-form percentage. Calibration beats false precision. - ▸Two independent sources required for any high-confidence attribution; the validator rejects single-source highs.
- ▸Source ontology distinguishes Primary (witness, signed text), Official (state, IGO), Press (named outlet), Aggregator (ACLED, GDELT), Open (forum, social), AI-derived. Provenance carries the class.
- ▸Contradictions across sources are first-class objects with materiality (
material | cosmetic). The system surfaces the disagreement; the analyst adjudicates.
What we will not do
We will not score sources behind the analyst's back. Reliability tiers are exposed, editable, and version-controlled. We will not auto-attribute incidents to actors above moderate confidence without two independent sources. We will not generate prose that sounds like an intelligence product when the underlying evidence does not support it. The tag is the contract.
SOURCES