Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agent Laboratory: Using LLM Agents as Research Assistants

About

Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research quality, we introduce Agent Laboratory, an autonomous LLM-based framework capable of completing the entire research process. This framework accepts a human-provided research idea and progresses through three stages--literature review, experimentation, and report writing to produce comprehensive research outputs, including a code repository and a research report, while enabling users to provide feedback and guidance at each stage. We deploy Agent Laboratory with various state-of-the-art LLMs and invite multiple researchers to assess its quality by participating in a survey, providing human feedback to guide the research process, and then evaluate the final paper. We found that: (1) Agent Laboratory driven by o1-preview generates the best research outcomes; (2) The generated machine learning code is able to achieve state-of-the-art performance compared to existing methods; (3) Human involvement, providing feedback at each stage, significantly improves the overall quality of research; (4) Agent Laboratory significantly reduces research expenses, achieving an 84% decrease compared to previous autonomous research methods. We hope Agent Laboratory enables researchers to allocate more effort toward creative ideation rather than low-level coding and writing, ultimately accelerating scientific discovery.

Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, Emad Barsoum• 2025

Related benchmarks

TaskDatasetResultRank
Binary ClassificationRandom Pizza
Competition Score0.979
7
Binary ClassificationToxic Jigsaw
Competition Score0.987
7
RegressionTrans. Conductors
Competition Score0.069
7
Tabular ClassificationTabular May 2022
Competition Score99.8
7
Text NormalizationEnglish Text Norm.
Competition Score0.997
7
Text NormalizationRuss. Text Norm.
Competition Score99
7
Multi-class classificationSpooky Author
Competition Score0.418
7
Binary ClassificationInsult Detection
Competition Score83.3
7
RegressionNYC Taxi
Competition Score3.597
7
Tabular ClassificationTabular Dec. 2021
Competition Score0.956
7
Showing 10 of 10 rows

Other info

Follow for update