Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System

About

Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving, yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness in LLM-based MAS through LLM training. Optima employs an iterative generate, rank, select, and train paradigm with a reward function balancing task performance, token efficiency, and communication readability. We explore various RL algorithms, including Supervised Fine-Tuning, Direct Preference Optimization, and their hybrid approaches, providing insights into their effectiveness-efficiency trade-offs. We integrate Monte Carlo Tree Search-inspired techniques for DPO data generation, treating conversation turns as tree nodes to explore diverse interaction paths. Evaluated on common multi-agent tasks, including information-asymmetric question answering and complex reasoning, Optima shows consistent and substantial improvements over single-agent baselines and vanilla MAS based on Llama 3 8B, achieving up to 2.8x performance gain with less than 10\% tokens on tasks requiring heavy information exchange. Moreover, Optima's efficiency gains open new possibilities for leveraging inference-compute more effectively, leading to improved inference-time scaling laws. By addressing fundamental challenges in LLM-based MAS, Optima shows the potential towards scalable, efficient, and effective MAS (https://chenweize1998.github.io/optima-project-page).

Weize Chen, Jiarui Yuan, Chen Qian, Cheng Yang, Zhiyuan Liu, Maosong Sun• 2024

Related benchmarks

TaskDatasetResultRank
DebateARC-C
Accuracy78.8
17
DebateMMLU
Acc61.2
17
Information ExchangeHotpotQA
F1 Score0.574
17
Information Exchange2WMH QA
F1 Score76.7
17
Information ExchangeTriviaQA
F1 Score77.5
17
DebateMATH
Accuracy36.9
10
DebateGSM8K
Accuracy84.4
10
Information ExchangeCBT
F1 Score72.2
10
DeepSearchWebWalker
Success Rate46
9
Showing 9 of 9 rows

Other info

Code

Follow for update