A Cloud-based Multi-Agentic Workflow for Science
About
As Large Language Models (LLMs) become ubiquitous across various scientific domains, their lack of ability to perform complex tasks like running simulations or to make complex decisions limits their utility. LLM-based agents bridge this gap due to their ability to call external resources and tools and thus are now rapidly gaining popularity. However, coming up with a workflow that can balance the models, cloud providers, and external resources is very challenging, making implementing an agentic system more of a hindrance than a help. In this work, we present a domain-agnostic, model-independent workflow for an agentic framework that can act as a scientific assistant while being run entirely on cloud. Built with a supervisor agent marshaling an array of agents with individual capabilities, our framework brings together straightforward tasks like literature review and data analysis with more complex ones like simulation runs. We describe the framework here in full, including a proof-of-concept system we built to accelerate the study of Catalysts, which is highly important in the field of Chemistry and Material Science. We report the cost to operate and use this framework, including the breakdown of the cost by services use. We also evaluate our system on a custom-curated synthetic benchmark and a popular Chemistry benchmark, and also perform expert validation of the system. The results show that our system is able to route the task to the correct agent 90% of the time and successfully complete the assigned task 97.5% of the time for the synthetic tasks and 91% of the time for real-world tasks, while still achieving better or comparable accuracy to most frontier models, showing that this is a viable framework for other scientific domains to replicate.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Chemistry Question Answering | ChemBench Analytical Chemistry | Success Rate82.2 | 1 | |
| Chemistry Question Answering | ChemBench Chemical Preference | Success Rate93.1 | 1 | |
| Chemistry Question Answering | ChemBench General Chemistry | Success Rate0.872 | 1 | |
| Chemistry Question Answering | ChemBench Inorganic Chemistry | Success Rate91.3 | 1 | |
| Chemistry Question Answering | ChemBench Materials Science | Success Rate90.5 | 1 | |
| Chemistry Question Answering | ChemBench Organic Chemistry | Success92.1 | 1 | |
| Chemistry Question Answering | ChemBench Physical Chemistry | Success8.42e+3 | 1 | |
| Chemistry Question Answering | ChemBench Technical Chemistry | Success Rate0.825 | 1 | |
| Chemistry Question Answering | ChemBench Toxicity and Safety | Success Rate90.1 | 1 | |
| Chemistry Question Answering | ChemBench Overall | Success90.5 | 1 |