Guiding Language Model Reasoning with Planning Tokens

About

Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought (CoT) reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets and one multihop QA dataset with respect to standard fine-tuning baselines.

Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni• 2023

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	Game of 24	Accuracy7	147
Logical reasoning	ProntoQA (test)	Accuracy81.5	57
Logical reasoning	ProofWriter (test)	Accuracy49	57
Logical reasoning	ProofWriter	Accuracy49	43
Planning	Blocksworld (test)	Accuracy97	35
Combinatorial Reasoning	Graph Coloring	Accuracy64	30
Arithmetic Reasoning	Game of 24 (test)	Success Rate7	28
Planning	BlocksWorld	Blocksworld Accuracy97	26
Logical reasoning	Rule-chaining	Accuracy77	21
Combinatorial Search	N-Queens N=8	Accuracy16.1	21

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord