Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient LLM Collaboration via Planning

About

Recently, large language models (LLMs) have demonstrated strong performance, ranging from simple to complex tasks. However, while large models achieve remarkable results across diverse tasks, they often incur substantial monetary inference cost, making frequent use impractical for many applications. In contrast, small models are often freely available and easy to deploy locally, but their performance on complex tasks remains limited. This trade-off raises a natural question: how can small and large models efficiently collaborate to combine their complementary strengths? To bridge this trade-off, we propose COPE, a test-time collaboration framework. A planner model first generates a plan that serves as a lightweight intermediate that guides a downstream executor model. Small and large models take turns acting as planner and executor, exchanging plans in a multi-stage cascade to collaboratively solve tasks. Through comprehensive experiments on benchmarks spanning mathematical reasoning, code generation, open-ended tasks, and agent tasks, we demonstrate that COPE achieves performance comparable to large proprietary models, while drastically reducing the inference API cost. These results highlight planning as an effective prior for cost-efficient inference.

Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Kyungjoon Park, Dongjun Lee, Jinwoo Shin• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMATH500 (test)
Accuracy75.8
895
Agent TaskAlfWorld
Success Rate36.9
40
Code GenerationMBPP
Accuracy66.4
24
Mathematical ReasoningAIME 2024
Accuracy40
8
Open-ended tasksMT-Bench (test)
Win Rate44.8
2
Showing 5 of 5 rows

Other info

Follow for update