StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

About

Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts. It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2\% $\rightarrow$ 38.8\%), commonsense reasoning (70.3\% $\rightarrow$ 72.5\%), algorithmic reasoning (73.7\% $\rightarrow$ 85.0\%), and symbolic reasoning (30.0\% $\rightarrow$ 79.2\%). Further analysis reveals that StrategyLLM is applicable to various LLMs and demonstrates advantages across numerous scenarios.

Chang Gao, Haiyun Jiang, Deng Cai, Shuming Shi, Wai Lam• 2023

Related benchmarks

Task	Dataset	Result
Reasoning	BIG-Bench Hard (BBH) (test)	Average Accuracy57.1	62
Reasoning	StrategyQA	Accuracy83.5	58
Multi-hop Reasoning	StrategyQA	Accuracy83.5	50
Mathematical Reasoning	CP	Accuracy54	20
Mathematical Reasoning	MA	Accuracy91.3	20
Reasoning	CP	Accuracy56	10
Reasoning	MA	Accuracy98.7	10
Algorithmic Reasoning	Big-Bench Hard Word Sorting and Multi-step Arithmetic (test)	WS Accuracy80	7
Commonsense Reasoning	StrategyQA and Big-Bench Hard Date Understanding (test)	StrategyQA Accuracy71	7
Math Reasoning	MATH (test)	Algebra Score64.5	7

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord