Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Guiding Language Model Reasoning with Planning Tokens

About

Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought (CoT) reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets and one multihop QA dataset with respect to standard fine-tuning baselines.

Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni• 2023

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningGame of 24
Accuracy7
147
Logical reasoningProntoQA (test)
Accuracy81.5
57
Logical reasoningProofWriter (test)
Accuracy49
57
Logical reasoningProofWriter
Accuracy49
43
Combinatorial ReasoningGraph Coloring
Accuracy64
30
Arithmetic ReasoningGame of 24 (test)
Success Rate7
28
PlanningBlocksWorld
Blocksworld Accuracy97
21
PlanningBlocksworld (test)
Accuracy97
21
Logical reasoningRule-chaining
Accuracy77
21
Combinatorial SearchN-Queens N=8
Accuracy16.1
21
Showing 10 of 13 rows

Other info

Follow for update