Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Causal Reasoning on CLadder
Loading...
99.89
Exact Match
ARYA
48.9404
62.1677
75.395
88.6223
Jan 29, 2026
Feb 6, 2026
Feb 15, 2026
Feb 24, 2026
Mar 4, 2026
Mar 13, 2026
Mar 22, 2026
Exact Match
LLM Score
DOVERIFIER Score
Updated 25d ago
Evaluation Results
Method
Method
Links
Exact Match
LLM Score
DOVERIFIER Score
ARYA
Prompting Strategy=Zer...
2026.03
99.89
-
-
Llama3.1-8B-Instruct
Parameters=8B, Backbon...
2026.01
88
66
90
Claude Opus 4.6
Prompting Strategy=Opt...
2026.03
87.2
-
-
Gemma-7B-it
Parameters=7B, Backbon...
2026.01
80
58
84
GPT-4
Context=Best Published...
2026.03
76.4
-
-
GPT-5.2
Prompting Strategy=Zer...
2026.03
67.8
-
-
Mistral-7B
Parameters=7B, Backbon...
2026.01
58
80
94
Llama3.1-8B
Parameters=8B, Backbon...
2026.01
57
60
73
Claude Opus 4.6
Prompting Strategy=Zer...
2026.03
50.9
-
-
Feedback
Search any
task
Search any
task