Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

About

Numerous applications of large language models (LLMs) rely on their ability to perform step-by-step reasoning. However, the reasoning behavior of LLMs remains poorly understood, posing challenges to research, development, and safety. To address this gap, we introduce landscape of thoughts (LoT), the first landscape visualization tool to inspect the reasoning trajectories with certain reasoning methods on any multi-choice dataset. We represent the textual states in a trajectory as numerical features that quantify the states' distances to the answer choices. These features are then visualized in two-dimensional plots using t-SNE. Qualitative and quantitative analysis with the landscape of thoughts effectively distinguishes between strong and weak models, correct and incorrect answers, as well as different reasoning tasks. It also uncovers undesirable reasoning patterns, such as low consistency and high uncertainty. Additionally, users can adapt LoT to a model that predicts the property they observe. We showcase this advantage by adapting LoT to a lightweight verifier that evaluates the correctness of trajectories. Empirically, this verifier boosts the reasoning accuracy and the test-time scaling effect. The code is publicly available at: https://github.com/tmlr-group/landscape-of-thoughts.

Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningCommonsenseQA
Accuracy64
136
Multi-hop ReasoningStrategyQA
Accuracy62.3
36
General Knowledge Question AnsweringMMLU
Accuracy62.3
18
ReasoningAQuA, MMLU, StrategyQA, and CommonSenseQA Average
Accuracy74
16
Mathematical ReasoningAQUA--
16
Language UnderstandingMMLU--
4
Language UnderstandingMMLU-Pro--
4
Showing 7 of 7 rows

Other info

Follow for update