MARG: Multi-Agent Review Generation for Scientific Papers

About

We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion. By distributing paper text across agents, MARG can consume the full text of papers beyond the input length limitations of the base LLM, and by specializing agents and incorporating sub-tasks tailored to different comment types (experiments, clarity, impact) it improves the helpfulness and specificity of feedback. In a user study, baseline methods using GPT-4 were rated as producing generic or very generic comments more than half the time, and only 1.7 comments per paper were rated as good overall in the best baseline. Our system substantially improves the ability of GPT-4 to generate specific and helpful feedback, reducing the rate of generic comments from 60% to 29% and generating 3.7 good comments per paper (a 2.2x improvement).

Mike D'Arcy, Tom Hope, Larry Birnbaum, Doug Downey• 2024

Related benchmarks

Task	Dataset	Result
Review Feedback Generation	RMR-75K (val)	Pairwise Win Rate61.9	72
Scientific Review Feedback Generation	ICLR LLM-as-a-Judge 2025 (test)	Actionability Score3.19	9
Scientific rebuttal generation	Scientific Rebuttal Evaluation dataset (test)	BLEU@410.95	9
Scientific Review Feedback Generation	ICLR Human Evaluation 2025 (test)	Actionability3.2	9

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord