Reflective Paper-to-Code Reproduction Enabled by Fine-Grained Verification

About

Reproducing machine learning papers is essential for scientific progress but remains challenging for both humans and automated agents. Existing agent-based methods often struggle to fully and accurately reproduce implementation details such as mathematical formulas and algorithmic logic. Previous studies show that reflection with explicit feedback improves agent performance. However, current paper reproduction methods fail to effectively adopt this strategy. This gap mainly arises from the diverse paper patterns, complex method modules, and varied configurations encountered in research papers. Motivated by how humans use systematic checklists to efficiently debug complex code, we propose \textbf{RePro}, a \textbf{Re}flective Paper-to-Code \textbf{Repro}duction framework that automatically extracts a paper's fingerprint, referring to a comprehensive set of accurate and atomic criteria serving as high-quality supervisory signals. The framework first generates code based on the extracted information, and then leverages the fingerprint within iterative verification and refinement loop. This approach systematically detects discrepancies and produces targeted revisions to align generated code with the paper's implementation details. Extensive experiments on the PaperBench Code-Dev benchmark have been conducted, RePro achieves 13.0\% performance gap over baselines, and it correctly revises complex logical and mathematical criteria in reflecting, on which the effectiveness is obvious.

Mingyang Zhou, Quanming Yao, Lun Du, Lanning Wei, Da Zheng• 2025

Related benchmarks

Task	Dataset	Result	Rank
Paper-to-Code Reproduction	PaperBench Code (dev)	Final Score61.4		9
Paper-to-Code Reproduction	PaperBench Code ICML 2024 (dev)	Average Score0.626		6

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord