Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MCC-KD: Multi-CoT Consistent Knowledge Distillation

About

Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets.

Hongzhan Chen, Siyue Wu, Xiaojun Quan, Rui Wang, Ming Yan, Ji Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningCSQA
Accuracy81.72
366
Commonsense ReasoningStrategyQA
Accuracy67.99
174
Mathematical ReasoningMATH 500
MATH 500 Accuracy82.2
106
ReasoningARC-C--
80
Commonsense ReasoningCommonsenseQA
Accuracy (pass@1)40.67
45
ReasoningStrategyQA
Accuracy64.37
40
Mathematical ReasoningAIME 25
Average@16 Score5.21
26
Mathematical ReasoningAMC23
Average@1635.47
26
Mathematical ReasoningMATH500
Accuracy82.2
21
Mathematical ReasoningSVAMP
Accuracy91
21
Showing 10 of 13 rows

Other info

Follow for update