Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Code Reasoning benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Code Reasoning
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
LiveCodeBench
Gemini-3.0
Accuracy
87.4
62
25d ago
CRUXEval
Gemini-3-Pro-preview
Input-CoT Accuracy
98.8
56
1mo ago
HumanEval
DeepSeek-R1-Distill-Qwen-14B (Reasoning)
HumanEval Score
95.73
40
19d ago
HumanE
Denser
Accuracy
84.9
35
1mo ago
MBPP
Qwen2.5-Coder-CFB-Aug
MBPP Execution Accuracy
84.7
33
1mo ago
CRUXEval-O
Kimi-K2 Base
Accuracy
83.5
28
1mo ago
MBPP
Denser
Accuracy
67.3
23
1mo ago
CRUX
RMoA
Accuracy
87.37
23
1mo ago
CRUXEval
Qwen2.5-Math-72B-Instruct
Accuracy
68.6
21
1mo ago
LiveCodeBench 1.0 (test)
A3PO
Accuracy
47.2
18
1mo ago
CruxEval Output
DataFlow-Code-10K
Score
51
12
1mo ago
LiveCodeBench
REBALANCE
Pass@1 Accuracy
88.3
11
8d ago
LiveCodeBench (LCB) v6
INTUITOR
Accuracy
15.3
8
1mo ago
LCB
SCF-RKL
pass@1
62.46
8
1mo ago
CodeForces
LAD
Rating
1,533.64
6
1mo ago
HumanEval+
LAD
Average Score @16
82.29
6
1mo ago
LiveCodeBench
LAD
Avg@16
33.51
6
1mo ago
CRUX-O
HSA-UL
Accuracy
40.75
6
1mo ago
LiveCodeBenchPro Med 25Q2
Nemotron-Cascade-2 30B-A3B
Pass@1
36.8
5
29d ago
LiveCodeBenchPro Easy 25Q2
Nemotron-Cascade-2 30B-A3B
Pass@1
89.3
5
29d ago
LiveCodeBench 2408-2505 v6
Nemotron-Cascade-2 30B-A3B
Pass@1
88.4
5
29d ago
MBPP base and extended (out-of-distribution)
InftyThink+
Accuracy
55.83
5
1mo ago
HumanEval base and extended (out-of-distribution)
InftyThink+
Accuracy
0.677
5
1mo ago
SciCode
Nemotron-3-Super 120B-A12B
Pass@1
42.1
4
29d ago
CRUXEval I
Kimi-K2 Base
Accuracy
74
4
1mo ago
Showing 25 of 28 rows
25 / page
50 / page
100 / page
1
2
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs