Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Paper-to-Code Reproduction on PaperBench Code (dev)
Loading...
78.6
Final Score
Paper2Code + auto-plan & code optimized
3.512
23.006
42.5
61.994
May 27, 2025
Jun 27, 2025
Jul 29, 2025
Aug 29, 2025
Sep 30, 2025
Oct 31, 2025
Dec 2, 2025
Final Score
Original Score
Average Improvement
Updated 9d ago
Evaluation Results
Method
Method
Links
Final Score
Original Score
Average Improvement
Paper2Code + auto-plan & code optimized
LLM=GPT-4.1, Iter.=1
2025.12
78.6
68.2
15.25
RePro
LLM=o3-mini-high, Iter.=5
2025.12
61.4
52.8
16.29
AUTOREPRODUCE (w/ Visual Diagram)
Backbone=o3-mini
2025.05
49.6
-
-
AUTOREPRODUCE (Default Setting)
Backbone=o3-mini
2025.05
48.5
-
-
PaperCoder
Backbone=o3-mini
2025.05
45.1
-
-
AUTOREPRODUCE (w/o Paper Lineage)
Backbone=o3-mini
2025.05
44.1
-
-
IterativeAgent
Backbone=o1-high
2025.05
43.4
-
-
IterativeAgent
Backbone=o3-mini
2025.05
17.3
-
-
BasicAgent
Backbone=o3-mini
2025.05
6.4
-
-
Feedback
Search any
task
Search any
task