Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ControlLLM: Augment Language Models with Tools by Searching on Graphs

About

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a \textit{task decomposer} that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a \textit{Thoughts-on-Graph (ToG) paradigm} that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an \textit{execution engine with a rich toolbox} that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods. The code is at https://github.com/OpenGVLab/ControlLLM.

Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang• 2023

Related benchmarks

TaskDatasetResultRank
Task PlanningTaskBench Daily Life
Node-F197.36
25
Task PlanningHugging Face
Node-F177.56
25
Task PlanningTaskBench Multimedia
Node F188.16
25
Task PlanningRestBench TMDB
Node F181.78
25
PlanningUltraTool (test)
n-F171.99
24
Task PlanningHugging Face v1 (test)
n-F141.06
17
Showing 6 of 6 rows

Other info

Follow for update