Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Causal Reasoning on Cladder AceReason (Reduced)
Loading...
80.2
Accuracy
Model-first Greedy
74.584
76.042
77.5
78.958
May 21, 2026
Accuracy
Updated 8d ago
Evaluation Results
Method
Method
Links
Accuracy
Model-first Greedy
k=5, Summarizer=AceReason
2026.05
80.2
Best-model
k=5, Summarizer=AceReason
2026.05
79.3
Input-all
k=5, Summarizer=AceReason
2026.05
79
Top-accuracy
k=5, Summarizer=AceReason
2026.05
78
Conditioned-diversity
k=5, Summarizer=AceReason
2026.05
76.5
Oracle-surrogate Greedy
k=5, Summarizer=AceReason
2026.05
76.5
GPT5.2-judge
k=5, Summarizer=AceReason
2026.05
76.3
Truth-prediction Greedy
k=5, Summarizer=AceReason
2026.05
76.1
MoA
k=5, Summarizer=AceReason
2026.05
76
Aya-judge
k=5, Summarizer=AceReason
2026.05
74.8
Feedback
Search any
task
Search any
task