Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MCC-KD: Multi-CoT Consistent Knowledge Distillation

About

Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets.

Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningCSQA
Accuracy81.72
366
Commonsense ReasoningStrategyQA
Accuracy67.99
208
ReasoningARC-C--
112
Commonsense ReasoningCommonsenseQA
Accuracy (pass@1)40.67
108
Mathematical ReasoningMATH 500
MATH 500 Accuracy82.2
106
Mathematical ReasoningAMC23
Average@1635.47
63
ReasoningStrategyQA
Accuracy64.37
52
Mathematical ReasoningAIME 25
Average@16 Score5.21
33
Mathematical ReasoningAIME24
AIME24 Avg@165.63
26
Mathematical ReasoningMATH500
Accuracy82.2
21
Showing 10 of 13 rows

Other info

Follow for update