Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

About

AI is increasingly playing a pivotal role in transforming how scientific discoveries are made. We introduce The AI Scientist-v2, an end-to-end agentic system capable of producing the first entirely AI generated peer-review-accepted workshop paper. This system iteratively formulates scientific hypotheses, designs and executes experiments, analyzes and visualizes data, and autonomously authors scientific manuscripts. Compared to its predecessor (v1, Lu et al., 2024 arXiv:2408.06292), The AI Scientist-v2 eliminates the reliance on human-authored code templates, generalizes effectively across diverse machine learning domains, and leverages a novel progressive agentic tree-search methodology managed by a dedicated experiment manager agent. Additionally, we enhance the AI reviewer component by integrating a Vision-Language Model (VLM) feedback loop for iterative refinement of content and aesthetics of the figures. We evaluated The AI Scientist-v2 by submitting three fully autonomous manuscripts to a peer-reviewed ICLR workshop. Notably, one manuscript achieved high enough scores to exceed the average human acceptance threshold, marking the first instance of a fully AI-generated paper successfully navigating a peer review. This accomplishment highlights the growing capability of AI in conducting all aspects of scientific research. We anticipate that further advancements in autonomous scientific discovery technologies will profoundly impact human knowledge generation, enabling unprecedented scalability in research productivity and significantly accelerating scientific breakthroughs, greatly benefiting society at large. We have open-sourced the code at https://github.com/SakanaAI/AI-Scientist-v2 to foster the future development of this transformative technology. We also discuss the role of AI in science, including AI safety.

Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, David Ha• 2025

Related benchmarks

TaskDatasetResultRank
Automated Peer Review EvaluationDeepReview-13K 1.0 (test)
H-Max Technical Accuracy7.59
30
Scientific DiscoveryTMC
Solution Quality84.86
14
Automated Peer ReviewDeepReview-13K 2025 (test)
Technical Accuracy Win49.2
14
Scientific DiscoveryMBO
Solution Quality76.56
14
Scientific DiscoverySpo
SQ (%)36.74
14
Scientific DiscoveryAverage MBO, NHO, SPO, TMC
Avg APD22.6
14
Scientific DiscoveryNHO
Solution Quality (SQ)0.7785
14
Nanophotonic Helix OptimizationNHO (test)
SQ71.5
5
Showing 8 of 8 rows

Other info

Follow for update