Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models

About

Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disentangles abstract tool creation and concrete decision execution, resulting in improved performance. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems and diverse tabular contents. Remarkably, CREATOR outperforms existing chain-of-thought, program-of-thought, and tool-using baselines. Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability. Further research demonstrates that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to adapt to diverse situations. The tool creation ability revolutionizes the LLM's problem-solving paradigm, driving us closer to the next frontier of artificial intelligence. All the codes and data are released.

Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, Heng Ji• 2023

Related benchmarks

TaskDatasetResultRank
CausalityQRData
Accuracy39.8
36
ChemistrySciBench
Accuracy60
32
PhysicsTheoremQA
Accuracy57
28
Physics Problem SolvingScibench fund
Accuracy70.4
24
Multimodal Tool-use ReasoningTRBench Mathematics
Alg. Accuracy79.57
7
Multimodal Tool-use ReasoningTRBench Total
Accuracy67.47
7
Multimodal Tool-use ReasoningTRBench Science
Accuracy (Biology)76
7
Multimodal Tool-use ReasoningTRBench General sub-dataset
Cultural Understanding Score28
7
Showing 8 of 8 rows

Other info

Follow for update