Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Counterfactual Reasoning on Twin-EventLog Smallville context
Loading...
100
Joint Accuracy (A ∧ B)
substrate
32.4
49.95
67.5
85.05
May 15, 2026
Joint Accuracy (A ∧ B)
Per-Arm Accuracy
Divergence Detection Accuracy
N Specifications Count
Updated 16d ago
Evaluation Results
Method
Method
Links
Joint Accuracy (A ∧ B)
Per-Arm Accuracy
Divergence Detection Accuracy
N Specifications Count
substrate
Reasoning=Correct by c...
2026.05
100
100
100
500
Llama-3.1-8B direct
Prompting strategy=pro...
2026.05
81.2
88
86.4
500
Concordia-style
Interface=Concordia's...
2026.05
35
-
79
100
Feedback
Search any
task
Search any
task