DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review

About

Automated peer review is often framed as generating fluent critique, yet reviewers and area chairs need judgments they can \emph{audit}: where a concern applies, what evidence supports it, and what concrete follow-up is required. DeepReviewer~2.0 is a process-controlled agentic review system built around an output contract: it produces a \textbf{traceable review package} with anchored annotations, localized evidence, and executable follow-up actions, and it exports only after meeting minimum traceability and coverage budgets. Concretely, it first builds a manuscript-only claim--evidence--risk ledger and verification agenda, then performs agenda-driven retrieval and writes anchored critiques under an export gate. On 134 ICLR~2025 submissions under three fixed protocols, an \emph{un-finetuned 196B} model running DeepReviewer~2.0 outperforms Gemini-3.1-Pro-preview, improving strict major-issue coverage (37.26\% vs.\ 23.57\%) and winning 71.63\% of micro-averaged blind comparisons against a human review committee, while ranking first among automatic systems in our pool. We position DeepReviewer~2.0 as an assistive tool rather than a decision proxy, and note remaining gaps such as ethics-sensitive checks.

Yixuan Weng, Minjun Zhu, Qiujie Xie, Zhiyuan Ning, Shichen Li, Panzhong Lu, Zhen Lin, Enhao Gu, Qiyao Sun, Yue Zhang• 2026

Related benchmarks

Task	Dataset	Result
Peer Review Evaluation	Anonymous Peer Review Dataset All Dimensions micro	DeepReviewer 2.0 Win Rate71.63	1
Peer Review Evaluation	Anonymous Peer Review Dataset Technical Accuracy	DeepReviewer 2.0 Win Rate59.69	1
Peer Review Evaluation	Anonymous Peer Review Dataset Constructive Value	DeepReviewer 2.0 Win Rate84.5	1
Peer Review Evaluation	Anonymous Peer Review Dataset Analytical Depth	DeepReviewer 2.0 Win Rate58.14	1
Peer Review Evaluation	Anonymous Peer Review Dataset Communication Clarity	DeepReviewer 2.0 Win Rate86.05	1
Peer Review Evaluation	Anonymous Peer Review Dataset Overall Judgment	DeepReviewer 2.0 Win Rate69.77	1

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord