MCC-KD: Multi-CoT Consistent Knowledge Distillation
About
Large language models (LLMs) have showcased remarkable capabilities in complex reasoning through chain of thought (CoT) prompting. Recently, there has been a growing interest in transferring these reasoning abilities from LLMs to smaller models. However, achieving both the diversity and consistency in rationales presents a challenge. In this paper, we focus on enhancing these two aspects and propose Multi-CoT Consistent Knowledge Distillation (MCC-KD) to efficiently distill the reasoning capabilities. In MCC-KD, we generate multiple rationales for each question and enforce consistency among the corresponding predictions by minimizing the bidirectional KL-divergence between the answer distributions. We investigate the effectiveness of MCC-KD with different model architectures (LLaMA/FlanT5) and various model scales (3B/7B/11B/13B) on both mathematical reasoning and commonsense reasoning benchmarks. The empirical results not only confirm MCC-KD's superior performance on in-distribution datasets but also highlight its robust generalization ability on out-of-distribution datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Commonsense Reasoning | CSQA | Accuracy81.72 | 366 | |
| Commonsense Reasoning | StrategyQA | Accuracy67.99 | 125 | |
| Mathematical Reasoning | MATH 500 | MATH 500 Accuracy82.2 | 106 | |
| Mathematical Reasoning | MATH500 | Accuracy82.2 | 21 | |
| Mathematical Reasoning | SVAMP | Accuracy91 | 21 | |
| Mathematical Reasoning | GSM8K | Accuracy90.52 | 21 |