S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs
About
Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes, motivating a key question: Can LLMs acquire a fast-thinking mode analogous to human System 1 reasoning? To explore this, our study presents a self-sampling framework based on activation steering for efficient CoT learning. Our method can induce style-aligned and variable-length reasoning traces from target LLMs themselves without any teacher guidance, thereby alleviating a central bottleneck of SFT-based methods-the scarcity of high-quality supervision data. Using filtered data by gold answers, we perform SFT for efficient CoT learning with (i) a human-like dual-cognitive system, and (ii) a progressive compression curriculum. Furthermore, we explore a self-evolution regime in which SFT is driven solely by prediction-consistent data of variable-length variants, eliminating the need for gold answers. Extensive experiments on math benchmarks, together with cross-domain generalization tests in medicine, show that our method yields stable improvements for both general and R1-style LLMs. Our data and model checkpoints can be found at https://github.com/DYR1/S3-CoT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mathematical Reasoning | GSM8K | Accuracy93.17 | 351 | |
| Mathematical Reasoning | AMC23 | Accuracy90.83 | 18 | |
| Mathematical Reasoning | Math Benchmarks Aggregate | Accuracy (Avg)81.28 | 18 | |
| Medical Question Answering | Medical Benchmarks (MedQA, MedMCQA, BULLET) (test) | MedQA Accuracy0.545 | 18 | |
| Mathematical Reasoning | MATH | Accuracy92 | 18 | |
| Mathematical Reasoning | AIME 24 | Accuracy51.11 | 18 | |
| Mathematical Reasoning | Math Benchmarks (GSM8K, MATH, AMC23, AIME24) (test) | Accuracy (GSM8K)95 | 8 |