The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models

About

In this paper, we introduce Concise Chain-of-Thought (CCoT) prompting. We compared standard CoT and CCoT prompts to see how conciseness impacts response length and correct-answer accuracy. We evaluated this using GPT-3.5 and GPT-4 with a multiple-choice question-and-answer (MCQA) benchmark. CCoT reduced average response length by 48.70% for both GPT-3.5 and GPT-4 while having a negligible impact on problem-solving performance. However, on math problems, GPT-3.5 with CCoT incurs a performance penalty of 27.69%. Overall, CCoT leads to an average per-token cost reduction of 22.67%. All code, data, and supplemental materials are available on GitHub at https://github.com/matthewrenze/jhu-concise-cot

Matthew Renze, Erhan Guven• 2024

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy93.33	499
Mathematical Reasoning	Math Benchmarks Aggregate	Accuracy (Avg)81.9	62
Mathematical Reasoning	AMC23	Accuracy90.83	18
Mathematical Reasoning	MATH	Accuracy92.33	18
Mathematical Reasoning	AIME 24	Accuracy51.11	18
Medical Question Answering	Medical Benchmarks (MedQA, MedMCQA, BULLET) (test)	MedQA Accuracy0.4917	18
Mathematical Reasoning	AIME 2025	Accuracy35.33	12
Mathematical Reasoning	MATH 500	Accuracy (%)92.26	12
Mathematical Reasoning	AIME 2024	Accuracy52.33	12
Mathematical Reasoning	Math Benchmarks (GSM8K, MATH, AMC23, AIME24) (test)	Accuracy (GSM8K)96	8

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord