AI-Researcher: Autonomous Scientific Innovation

About

The powerful reasoning capabilities of Large Language Models (LLMs) in mathematics and coding, combined with their ability to automate complex tasks through agentic frameworks, present unprecedented opportunities for accelerating scientific innovation. In this paper, we introduce AI-Researcher, a fully autonomous research system that transforms how AI-driven scientific discovery is conducted and evaluated. Our framework seamlessly orchestrates the complete research pipeline--from literature review and hypothesis generation to algorithm implementation and publication-ready manuscript preparation--with minimal human intervention. To rigorously assess autonomous research capabilities, we develop Scientist-Bench, a comprehensive benchmark comprising state-of-the-art papers across diverse AI research domains, featuring both guided innovation and open-ended exploration tasks. Through extensive experiments, we demonstrate that AI-Researcher achieves remarkable implementation success rates and produces research papers that approach human-level quality. This work establishes new foundations for autonomous scientific innovation that can complement human researchers by systematically exploring solution spaces beyond cognitive limitations.

Jiabin Tang, Lianghao Xia, Zhonghang Li, Chao Huang• 2025

Related benchmarks

Task	Dataset	Result
Scientific Discovery	Spo	SQ (%)36.95	14
Scientific Discovery	TMC	Solution Quality83.79	14
Scientific Discovery	NHO	Solution Quality (SQ)0.7905	14
Scientific Discovery	MBO	Solution Quality69.72	14
Scientific Discovery	Average MBO, NHO, SPO, TMC	Avg APD24.9	14
Idea Generation Assessment	AI-Idea-Bench 2025	--	12
Automated Research	20 LLM-simulated scientists	Alignment Score4.206	7
Research Automation	Three real research tasks Human researcher evaluation	Alignment4.333	7
AI Research Paper Generation Evaluation	Public papers (Overall)	Soundness Score1.86	6
Scientific Paper Generation	AI-generated public papers Max Rating Paper	Soundness2.25	6

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord