ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving
About
To enhance large language models (LLMs) for chemistry problem solving, several LLM-based agents augmented with tools have been proposed, such as ChemCrow and Coscientist. However, their evaluations are narrow in scope, leaving a large gap in understanding the benefits of tools across diverse chemistry tasks. To bridge this gap, we develop ChemToolAgent, an enhanced chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its performance on both specialized chemistry tasks and general chemistry questions. Surprisingly, ChemToolAgent does not consistently outperform its base LLMs without tools. Our error analysis with a chemistry expert suggests that: For specialized chemistry tasks, such as synthesis prediction, we should augment agents with specialized tools; however, for general chemistry questions like those in exams, agents' ability to reason correctly with chemistry knowledge matters more, and tool augmentation does not always help.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Molecule Captioning | ChEBI-20 MM (test) | BLEU-20.63 | 12 | |
| Reaction prediction | USPTO-MIT | Exact Match78 | 12 | |
| Text-based Molecule Design | ChEBI-20-MM | Exact Match28 | 11 | |
| Molecular property prediction | MoleculeNet BBBP | Accuracy90 | 9 | |
| Molecular property prediction | MoleculeNet ClinTox | Accuracy82 | 9 | |
| Molecular property prediction | MoleculeNet HIV | Accuracy94 | 9 |