Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System

About

The rapid advancement of scientific progress requires innovative tools that can accelerate knowledge discovery. Although recent AI methods, particularly large language models (LLMs), have shown promise in tasks such as hypothesis generation and experimental design, they fall short of replicating the collaborative nature of real-world scientific practices, where diverse experts work together in teams to tackle complex problems. To address the limitations, we propose an LLM-based multi-agent system, i.e., Virtual Scientists (VirSci), designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. Through comprehensive experiments, we demonstrate that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas. We further investigate the collaboration mechanisms that contribute to its tendency to produce ideas with higher novelty, offering valuable insights to guide future research and illuminating pathways toward building a robust system for autonomous scientific discovery. The code is available at https://github.com/open-sciencelab/Virtual-Scientists.

Haoyang Su, Renqi Chen, Shixiang Tang, Zhenfei Yin, Xinzhe Zheng, Jinzhe Li, Biqing Qi, Qi Wu, Hui Li, Wanli Ouyang, Philip Torr, Bowen Zhou, Nanqing Dong• 2024

Related benchmarks

Task	Dataset	Result
Subjective evaluation of research ideas	100 Research Ideas 10 Benchmark Topics	Novelty Score5.48	15
Research Idea Generation	Ten benchmark topics (100 generated research ideas)	Average Wins4.07	15
Bioactivity-guided Molecule Generation	PMO-1K DRD2	Top-10 AUC92.9	13
Bioactivity-guided Molecule Generation	PMO-1K GSK3β	Top-10 AUC0.449	13
Bioactivity-guided Molecule Generation	PMO-1K JNK3	Top-10 AUC0.185	13
Idea Generation Assessment	AI-Idea-Bench 2025	Motivation Score3.95	12
Scientific Idea Generation	NeurIPS 2025	--	12
Scientific ideation	Scientific Ideation 60 samples human-validated (test)	Novelty2.21	9
Research Proposal Generation	LiveIdeaBench held-out	Live Score6.79	7
Research Proposal Generation	AI Idea Bench (AIIB) held-out 2025	AIIB Score6.97	7

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord