Divide-or-Conquer? Which Part Should You Distill Your LLM?

About

Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothesize that the decomposition should be easier to distill into a smaller model compared to the problem solving because the latter requires large amounts of domain knowledge while the former only requires learning general problem solving strategies. We propose methods to distill these two capabilities and evaluate their impact on reasoning outcomes and inference cost. We find that we can distill the problem decomposition phase and at the same time achieve good generalization across tasks, datasets, and models. However, it is harder to distill the problem solving capability without losing performance and the resulting distilled model struggles with generalization. These results indicate that by using smaller, distilled problem decomposition models in combination with problem solving LLMs we can achieve reasoning with cost-efficient inference and local adaptation.

Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy88.43	499
Mathematical Reasoning	MATH	Accuracy76.05	338
Mathematical Reasoning	TabMWP	Accuracy95.62	210
Commonsense Reasoning	CSQA	CSQA Accuracy77.42	195
Natural Language Inference	aNLI	Accuracy63.89	107
Question Answering	ARC-C	Accuracy90.12	69
Question Answering	SQA	Accuracy77.15	24
Reasoning	Date	Accuracy on Date74.92	24

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord