S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs

About

Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT.

Yanrui Du, Sendong Zhao, Yibo Gao, Danyang Zhao, Qika Lin, Ming Ma, Jiayun Li, Yi Jiang, Kai He, Qianyi Xu, Bing Qin, Mengling Feng• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy93.17	499
Mathematical Reasoning	Math Benchmarks Aggregate	Accuracy (Avg)81.28	62
Mathematical Reasoning	AMC23	Accuracy90.83	18
Medical Question Answering	Medical Benchmarks (MedQA, MedMCQA, BULLET) (test)	MedQA Accuracy0.545	18
Mathematical Reasoning	MATH	Accuracy92	18
Mathematical Reasoning	AIME 24	Accuracy51.11	18
Mathematical Reasoning	Math Benchmarks (GSM8K, MATH, AMC23, AIME24) (test)	Accuracy (GSM8K)95	8

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord