Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Discover: Large Language Models Self-Compose Reasoning Structures

About

We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningSVAMP
Accuracy17.33
403
Commonsense ReasoningCSQA
Accuracy57.33
366
Mathematical ReasoningGSM8K
Accuracy (GSM8K)7.43
358
Math ReasoningGSM8K (test)
Accuracy56.33
192
MathMATH 500
Accuracy90.3
86
MathematicsAIME 2025
Accuracy48.3
66
MathematicsAIME 2024
Accuracy66.7
60
ReasoningBIG-Bench Hard (BBH) (test)
Average Accuracy56.9
56
Logic reasoningTracking Shuffled Objects BBH
Accuracy60.03
54
Commonsense ReasoningMMLU
Accuracy52.63
41
Showing 10 of 17 rows

Other info

Follow for update