The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
About
In this paper, we introduce Concise Chain-of-Thought (CCoT) prompting. We compared standard CoT and CCoT prompts to see how conciseness impacts response length and correct-answer accuracy. We evaluated this using GPT-3.5 and GPT-4 with a multiple-choice question-and-answer (MCQA) benchmark. CCoT reduced average response length by 48.70% for both GPT-3.5 and GPT-4 while having a negligible impact on problem-solving performance. However, on math problems, GPT-3.5 with CCoT incurs a performance penalty of 27.69%. Overall, CCoT leads to an average per-token cost reduction of 22.67%. All code, data, and supplemental materials are available on GitHub at https://github.com/matthewrenze/jhu-concise-cot
Matthew Renze, Erhan Guven• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy93.33 | 351 | |
| Mathematical Reasoning | AMC23 | Accuracy90.83 | 18 | |
| Mathematical Reasoning | Math Benchmarks Aggregate | Accuracy (Avg)81.9 | 18 | |
| Mathematical Reasoning | MATH | Accuracy92.33 | 18 | |
| Mathematical Reasoning | AIME 24 | Accuracy51.11 | 18 | |
| Medical Question Answering | Medical Benchmarks (MedQA, MedMCQA, BULLET) (test) | MedQA Accuracy0.4917 | 18 | |
| Mathematical Reasoning | Math Benchmarks (GSM8K, MATH, AMC23, AIME24) (test) | Accuracy (GSM8K)96 | 8 |
Showing 7 of 7 rows