Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review

About

Automated peer review is often framed as generating fluent critique, yet reviewers and area chairs need judgments they can \emph{audit}: where a concern applies, what evidence supports it, and what concrete follow-up is required. DeepReviewer~2.0 is a process-controlled agentic review system built around an output contract: it produces a \textbf{traceable review package} with anchored annotations, localized evidence, and executable follow-up actions, and it exports only after meeting minimum traceability and coverage budgets. Concretely, it first builds a manuscript-only claim--evidence--risk ledger and verification agenda, then performs agenda-driven retrieval and writes anchored critiques under an export gate. On 134 ICLR~2025 submissions under three fixed protocols, an \emph{un-finetuned 196B} model running DeepReviewer~2.0 outperforms Gemini-3.1-Pro-preview, improving strict major-issue coverage (37.26\% vs.\ 23.57\%) and winning 71.63\% of micro-averaged blind comparisons against a human review committee, while ranking first among automatic systems in our pool. We position DeepReviewer~2.0 as an assistive tool rather than a decision proxy, and note remaining gaps such as ethics-sensitive checks.

Yixuan Weng, Minjun Zhu, Qiujie Xie, Zhiyuan Ning, Shichen Li, Panzhong Lu, Zhen Lin, Enhao Gu, Qiyao Sun, Yue Zhang• 2026

Related benchmarks

TaskDatasetResultRank
Peer Review EvaluationAnonymous Peer Review Dataset All Dimensions micro
DeepReviewer 2.0 Win Rate71.63
1
Peer Review EvaluationAnonymous Peer Review Dataset Technical Accuracy
DeepReviewer 2.0 Win Rate59.69
1
Peer Review EvaluationAnonymous Peer Review Dataset Constructive Value
DeepReviewer 2.0 Win Rate84.5
1
Peer Review EvaluationAnonymous Peer Review Dataset Analytical Depth
DeepReviewer 2.0 Win Rate58.14
1
Peer Review EvaluationAnonymous Peer Review Dataset Communication Clarity
DeepReviewer 2.0 Win Rate86.05
1
Peer Review EvaluationAnonymous Peer Review Dataset Overall Judgment
DeepReviewer 2.0 Win Rate69.77
1
Showing 6 of 6 rows

Other info

Follow for update