Deterministic Event-Graph Substrates as World Models for Counterfactual Reasoning
About
We study event-graph substrates: a class of world models that represent agent state as an append-only log of typed RDF triples and answer counterfactual queries by forking the log under a structured intervention vocabulary. Substrates are inspectable at the triple level, support exact counterfactuals, and transfer across domains without learned components. We formalize the class, prove a duality between explanatory and counterfactual queries that reduces both to the same causal-ancestor traversal, and evaluate a 1,400-line CLEVRER-DSL interpreter atop a domain-agnostic substrate runtime at full CLEVRER validation scale (n=75,618). The substrate exceeds the NS-DR symbolic oracle on all four per-question categories (by 9.89, 20.26, 17.65, and 0.80 percentage points), and exceeds the parametric ALOE baseline on descriptive and explanatory while lagging on predictive and counterfactual. We also introduce twin-EventLog, a 500-specification Park-canonical Smallville counterfactual benchmark on which the substrate exceeds Llama-3.1-8B with full context by 18.80 points joint accuracy.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Counterfactual Video Reasoning | CLEVRER (val) | Accuracy86.69 | 5 | |
| Explanatory Video Reasoning | CLEVRER (val) | Accuracy99.94 | 5 | |
| Predictive Video Reasoning | CLEVRER (val) | Accuracy84.07 | 5 | |
| Counterfactual reasoning | Twin-EventLog Smallville context | Joint Accuracy (A ∧ B)100 | 3 | |
| Descriptive Video Reasoning | CLEVRER (val) | Accuracy97.99 | 3 |