Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Experiment Reproduction on PaperBench Code (dev)
Loading...
64.1
Score
HiRAS
30.508
39.229
47.95
56.671
Apr 20, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
HiRAS
Model=Claude-Sonnet
2026.04
64.1
HiRAS
Model=DeepSeek-v3.1-Te...
2026.04
57.4
PaperCoder
Model=Claude-Sonnet
2026.04
51.1
AutoReproduce
Model=o3-mini
2026.04
48.5
HiRAS
Model=Qwen3-Coder-480B
2026.04
45.7
AutoReproduce
Model=DeepSeek-v3.1-Te...
2026.04
41.5
PaperCoder
Model=DeepSeek-v3.1-Te...
2026.04
40.8
PaperCoder
Model=Qwen3-Coder-480B...
2026.04
36.9
AutoReproduce
Model=Qwen3-Coder-480B...
2026.04
31.8
Feedback
Search any
task
Search any
task