Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

About

Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible multi-agent framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools also outperforms AutoGen, GPT-Functions, and LangChain by up to 10.6% when given the same set of tools. Through comprehensive analysi, ablations, and robustness tests with compact backbones and noisy tool environments, OctoTools demonstrates advantages in task planning, effective tool usage, and multi-step problem solving. Code, demos, and visualization are publicly available at https://octotools.github.io/.

Pan Lu, Bowen Chen, Sheng Liu, Rahul Thapa, Joseph Boen, James Zou• 2025

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringGQA
Accuracy68.58
1249
Text-based Visual Question AnsweringTextVQA
Accuracy77.17
807
Multi-hop Question AnsweringHotpotQA--
294
Science Question AnsweringScienceQA (SQA)
Accuracy84.13
273
Mathematical Multimodal ReasoningMathVista
Accuracy61.7
218
Medical Visual Question AnsweringVQA-RAD
Accuracy66.42
198
Medical Question AnsweringMedQA
Accuracy92.17
153
Document Visual Question AnsweringDocVQA
Accuracy89.39
132
Mathematical ReasoningGame of 24
Accuracy40.18
103
Knowledge-based Visual Question AnsweringOKVQA
Accuracy0.5342
79
Showing 10 of 27 rows

Other info

Follow for update