Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Thinker: Learning to Think Fast and Slow

About

Recent studies show that the reasoning capabilities of Large Language Models (LLMs) can be improved by applying Reinforcement Learning (RL) to question-answering (QA) tasks in areas such as math and coding. With a long context length, LLMs may learn to perform search, as indicated by the self-correction behavior observed in DeepSeek R1. However, this search behavior is often imprecise and lacks confidence, resulting in long, redundant responses and highlighting deficiencies in intuition and verification. Inspired by the Dual Process Theory in psychology, we introduce a simple modification to the QA task that includes four stages: Fast Thinking, where the LLM must answer within a strict token budget; Verification, where the model evaluates its initial response; Slow Thinking, where it refines the initial response with more deliberation; and Summarization, where it distills the refinement from the previous stage into precise steps. Our proposed task improves average accuracy from 25.6% to 27.3% for Qwen2.5-1.5B, and from 45.9% to 51.0% for DeepSeek-R1-Qwen-1.5B. Notably, for Qwen2.5-1.5B, the Fast Thinking mode alone achieves 25.2% accuracy using fewer than 1000 tokens, demonstrating substantial inference efficiency gains. These findings suggest that intuition and deliberative reasoning are distinct, complementary systems benefiting from targeted training. Additionally, we have open-sourced both the trained models and the source code.

Stephen Chung, Wenyu Du, Jie Fu• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMinerva
Pass@138.16
138
Mathematical ReasoningMATH
Pass@192.71
112
Mathematical ReasoningAMC
Pass@185.09
112
Mathematical ReasoningOlympiad Bench
Accuracy25.2
73
Mathematical ReasoningAIME 2025
Accuracy0.00e+0
58
Mathematical ReasoningMinerva Math
Accuracy23.5
54
Mathematical ReasoningOlympiad
Pass@158.62
50
Mathematical ReasoningMATH 500
Pass@1 Accuracy63.4
25
Mathematical ReasoningAMC 2023
Accuracy32.5
11
Showing 9 of 9 rows

Other info

Follow for update