Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abductive Event Reasoning on Task 12 (test)
Loading...
92.6
Base Accuracy
Ensemble Sonnet + GPT + Gemini
90.104
90.752
91.4
92.048
Mar 4, 2026
Base Accuracy
Post-hoc Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Base Accuracy
Post-hoc Accuracy
Ensemble Sonnet + GPT + Gemini
Configuration Type=Ens...
2026.03
92.6
95.2
GPT-5.2
Configuration Type=Ind...
2026.03
91.2
94.9
Gemini 3 Flash
Configuration Type=Ind...
2026.03
90.7
94.3
Claude Son. 4.5 Thinking
Configuration Type=Ind...
2026.03
90.4
95.2
SC: Sonnet 3× (θ=0.50)
Configuration Type=Sel...
2026.03
90.2
94.8
SC: Gemini 5× (θ=0.50)
Configuration Type=Sel...
2026.03
90.2
94.3
Feedback
Search any
task
Search any
task