Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reflective Paper-to-Code Reproduction Enabled by Fine-Grained Verification

About

Reproducing machine learning papers is essential for scientific progress but remains challenging for both humans and automated agents. Existing agent-based methods often struggle to fully and accurately reproduce implementation details such as mathematical formulas and algorithmic logic. Previous studies show that reflection with explicit feedback improves agent performance. However, current paper reproduction methods fail to effectively adopt this strategy. This gap mainly arises from the diverse paper patterns, complex method modules, and varied configurations encountered in research papers. Motivated by how humans use systematic checklists to efficiently debug complex code, we propose \textbf{RePro}, a \textbf{Re}flective Paper-to-Code \textbf{Repro}duction framework that automatically extracts a paper's fingerprint, referring to a comprehensive set of accurate and atomic criteria serving as high-quality supervisory signals. The framework first generates code based on the extracted information, and then leverages the fingerprint within iterative verification and refinement loop. This approach systematically detects discrepancies and produces targeted revisions to align generated code with the paper's implementation details. Extensive experiments on the PaperBench Code-Dev benchmark have been conducted, RePro achieves 13.0\% performance gap over baselines, and it correctly revises complex logical and mathematical criteria in reflecting, on which the effectiveness is obvious.

Mingyang Zhou, Quanming Yao, Lun Du, Lanning Wei, Da Zheng• 2025

Related benchmarks

TaskDatasetResultRank
Paper-to-Code ReproductionPaperBench Code (dev)
Final Score61.4
9
Paper-to-Code ReproductionPaperBench Code ICML 2024 (dev)
Average Score0.626
6
Showing 2 of 2 rows

Other info

Follow for update