Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MARG: Multi-Agent Review Generation for Scientific Papers

About

We study the ability of LLMs to generate feedback for scientific papers and develop MARG, a feedback generation approach using multiple LLM instances that engage in internal discussion. By distributing paper text across agents, MARG can consume the full text of papers beyond the input length limitations of the base LLM, and by specializing agents and incorporating sub-tasks tailored to different comment types (experiments, clarity, impact) it improves the helpfulness and specificity of feedback. In a user study, baseline methods using GPT-4 were rated as producing generic or very generic comments more than half the time, and only 1.7 comments per paper were rated as good overall in the best baseline. Our system substantially improves the ability of GPT-4 to generate specific and helpful feedback, reducing the rate of generic comments from 60% to 29% and generating 3.7 good comments per paper (a 2.2x improvement).

Mike D'Arcy, Tom Hope, Larry Birnbaum, Doug Downey• 2024

Related benchmarks

TaskDatasetResultRank
Review Feedback GenerationRMR-75K (val)
Pairwise Win Rate61.9
72
Scientific Review Feedback GenerationICLR LLM-as-a-Judge 2025 (test)
Actionability Score3.19
9
Scientific rebuttal generationScientific Rebuttal Evaluation dataset (test)
BLEU@410.95
9
Scientific Review Feedback GenerationICLR Human Evaluation 2025 (test)
Actionability3.2
9
Showing 4 of 4 rows

Other info

Follow for update