The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

About

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made. We introduce The AI Scientist-v2, an end-to-end agentic system capable of producing the first entirely AI generated peer-review-accepted workshop paper. This system iteratively formulates scientific hypotheses, designs and executes experiments, analyzes and visualizes data, and autonomously authors scientific manuscripts. Compared to its predecessor (v1, Lu et al., 2024 arXiv:2408.06292), The AI Scientist-v2 eliminates the reliance on human-authored code templates, generalizes effectively across diverse machine learning domains, and leverages a novel progressive agentic tree-search methodology managed by a dedicated experiment manager agent. Additionally, we enhance the AI reviewer component by integrating a Vision-Language Model (VLM) feedback loop for iterative refinement of content and aesthetics of the figures. We evaluated The AI Scientist-v2 by submitting three fully autonomous manuscripts to a peer-reviewed ICLR workshop. Notably, one manuscript achieved high enough scores to exceed the average human acceptance threshold, marking the first instance of a fully AI-generated paper successfully navigating a peer review. This accomplishment highlights the growing capability of AI in conducting all aspects of scientific research. We anticipate that further advancements in autonomous scientific discovery technologies will profoundly impact human knowledge generation, enabling unprecedented scalability in research productivity and significantly accelerating scientific breakthroughs, greatly benefiting society at large. We have open-sourced the code at https://github.com/SakanaAI/AI-Scientist-v2 to foster the future development of this transformative technology. We also discuss the role of AI in science, including AI safety.

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, David Ha• 2025

Related benchmarks

Task	Dataset	Result
Scientific Manuscript Reviewing	ICLR 2026 (test)	Actionability Score0.36	38
Automated Peer Review Evaluation	DeepReview-13K 1.0 (test)	H-Max Technical Accuracy7.59	30
Literature Review Quality Assessment	PaperWritingBench	Citation Practices Score49.55	16
Scientific Discovery	TMC	Solution Quality84.86	14
Automated Peer Review	DeepReview-13K 2025 (test)	Technical Accuracy Win49.2	14
Scientific Discovery	MBO	Solution Quality76.56	14
Scientific Discovery	Spo	SQ (%)36.74	14
Scientific Discovery	Average MBO, NHO, SPO, TMC	Avg APD22.6	14
Scientific Discovery	NHO	Solution Quality (SQ)0.7785	14
Scientific Idea Generation	AI-Scientist	Absolute Novelty3.8	14

Showing 10 of 37 rows

Other info

Follow for update

@wizwand_team Discord