Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Discover: Large Language Models Self-Compose Reasoning Structures

About

We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng• 2024

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningSVAMP
Accuracy17.33
368
Commonsense ReasoningCSQA
Accuracy57.33
366
Mathematical ReasoningGSM8K
Accuracy (GSM8K)7.43
358
Math ReasoningGSM8K (test)
Accuracy56.33
155
Logic reasoningTracking Shuffled Objects BBH
Accuracy60.03
54
Commonsense ReasoningMMLU
Accuracy52.63
37
Logic reasoningCausal Judgement
Accuracy36
30
ReasoningBIG-Bench Hard (BBH) (test)
Average Accuracy56.9
28
Knowledge and Commonsense ReasoningMMLU
Accuracy58.07
23
Logical reasoningT-Obj.
Accuracy2.4
23
Showing 10 of 11 rows

Other info

Follow for update