Share your thoughts, 1 month free Claude Pro on usSee more

Causal Reasoning on CLadder

99.89Exact Match

ARYA

Updated 4mo ago

Evaluation Results

Method	Links
ARYA 2026.03		99.89	-	-
Llama3.1-8B-Instruct 2026.01		88	66	90
Claude Opus 4.6 2026.03		87.2	-	-
Gemma-7B-it 2026.01		80	58	84
GPT-4 2026.03		76.4	-	-
GPT-5.2 2026.03		67.8	-	-
Mistral-7B 2026.01		58	80	94
Llama3.1-8B 2026.01		57	60	73
Claude Opus 4.6 2026.03		50.9	-	-