Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Paper-to-code reproduction on PaperBench Code ICML 2024 (dev)
Loading...
0.786
Average Score
paper2code + auto-plan & code optimized
0.6196
0.6628
0.706
0.7492
Dec 2, 2025
Average Score
Median Score
Win Rate
Avg. Improvement
Max Improvement
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Score
Median Score
Win Rate
Avg. Improvement
Max Improvement
paper2code + auto-plan & code optimized
LLM=GPT-4.1
2025.12
0.786
0.827
-
15.25
56.88
paper2code + auto-code optimized
LLM=GPT-4.1
2025.12
0.747
0.787
-
9.53
42.98
paper2code + auto-plan optimized
LLM=GPT-4.1
2025.12
0.723
0.768
-
6.01
58.23
paper2code
LLM=GPT-4.1
2025.12
0.682
0.692
-
-
-
paper2code + self-refine in plan
LLM=GPT-4.1
2025.12
0.655
0.655
-
-3.96
32.08
RePro
LLM=o3-mini-high
2025.12
0.626
-
-
-
-
Feedback
Search any
task
Search any
task