ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy

About

While Chain-of-Thought (CoT) prompting improves reasoning in large language models (LLMs), the excessive length of reasoning tokens increases latency and KV cache memory usage, and may even truncate final answers under context limits. We propose ThinkLess, an inference-efficient framework that terminates reasoning generation early and maintains output quality without modifying the model. Atttention analysis reveals that answer tokens focus minimally on earlier reasoning steps and primarily attend to the reasoning terminator token, due to information migration under causal masking. Building on this insight, ThinkLess inserts the terminator token at earlier positions to skip redundant reasoning while preserving the underlying knowledge transfer. To prevent format discruption casued by early termination, ThinkLess employs a lightweight post-regulation mechanism, relying on the model's natural instruction-following ability to produce well-structured answers. Without fine-tuning or auxiliary data, ThinkLess achieves comparable accuracy to full-length CoT decoding while greatly reducing decoding time and memory consumption.

Gengyang Li, Yifeng Gao, Yuming Li, Yunfang Wu• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH Hard	Accuracy81.8	198
Science Reasoning	ARC-C	Accuracy95.1	58
Mathematical Reasoning	AIME 2024	Accuracy3.3	54
Science Reasoning	GPQA D	Accuracy54.5	52
Math Reasoning	GSM8K	Accuracy93.4	49
Mathematical Reasoning	MATH Easy	Accuracy96.9	36
Scientific Reasoning	SCIENTIFIC	Accuracy74.8	36
Math Reasoning	MATH 500	Accuracy87.6	36
Math and Science Reasoning	Average	Accuracy62.9	36
Math Reasoning	AIME 2025	Accuracy30	36

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord