Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Reasoning benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Reasoning
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
BBH
GHG-TDA
Accuracy
95.4
726
6d ago
ARC
Qwen-7B-Instruct
Accuracy
94.5
245
4d ago
MMLU-Pro
Agent Q-Mix
Accuracy
92.86
241
22h ago
ARC Easy
GPT-4
Accuracy
96.63
233
7d ago
HellaSwag (HS)
Mistral-Small
HellaSwag Accuracy
91.84
209
6d ago
GPQA Diamond
Gemini-3.0 Pro
Accuracy
91.9
185
15d ago
WinoGrande (WG)
InternLM2-20B
Accuracy
85.2
168
12d ago
PIQA
LLaDA2.0-flash
Accuracy
96.5
164
6d ago
ARC-c
EMoE
Accuracy
90.85
112
1d ago
GSM8K
GPT-5.2
Accuracy
1
111
18h ago
7-benchmark commonsense and reading-comprehension suite (ARC-Easy, ARC-Challenge, HellaSwag, WinoGrande, PIQA, BoolQ, and OpenBookQA) LM Evaluation Harness default (test)
LATMiX-LU
Accuracy
68.77
108
3mo ago
ARC Challenge
Qwen3
Accuracy
97.2
100
7d ago
MATH 500
Gemini-3-Pro
Accuracy (%)
100
94
22d ago
BBH (test)
Agent-GWO
Accuracy
73.9
94
14d ago
OpenBookQA
BioBridge
Accuracy
88.4
92
13d ago
GPQA
Claude 3.5 Sonnet
Accuracy
59.4
88
22h ago
ARC Challenge
GPT-4o
Accuracy
96.7
81
16d ago
LiveBench Reasoning
DIP
Accuracy
92
80
3mo ago
GSM PRO
ZERO-SHOT
Accuracy
100
72
1mo ago
Reasoning Benchmarks BBH, MMLU, ARC-C, ThmQA (test)
Teacher
BBH
64.66
66
6d ago
AIME 24
Qwen3-Base SAT
Accuracy on AIME 24
86.3
65
25d ago
HLE
DeepSeek-V3.2
Accuracy (HLE Reasoning)
40.8
63
1mo ago
BIG-Bench Hard (BBH) (test)
GPT-4o
Average Accuracy
87.3
62
14d ago
Humanity's Last Exam
HEART
Accuracy
84.61
60
15d ago
AIME 24
PETS-On.
Accuracy
70
58
2mo ago
Showing 25 of 594 rows
25 / page
50 / page
100 / page
1
2
3
...
24
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs