Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

About

Numerous applications of large language models (LLMs) rely on their ability to perform step-by-step reasoning. However, the reasoning behavior of LLMs remains poorly understood, posing challenges to research, development, and safety. To address this gap, we introduce landscape of thoughts (LoT), the first landscape visualization tool to inspect the reasoning trajectories with certain reasoning methods on any multi-choice dataset. We represent the textual states in a trajectory as numerical features that quantify the states' distances to the answer choices. These features are then visualized in two-dimensional plots using t-SNE. Qualitative and quantitative analysis with the landscape of thoughts effectively distinguishes between strong and weak models, correct and incorrect answers, as well as different reasoning tasks. It also uncovers undesirable reasoning patterns, such as low consistency and high uncertainty. Additionally, users can adapt LoT to a model that predicts the property they observe. We showcase this advantage by adapting LoT to a lightweight verifier that evaluates the correctness of trajectories. Empirically, this verifier boosts the reasoning accuracy and the test-time scaling effect. The code is publicly available at: https://github.com/tmlr-group/landscape-of-thoughts.

Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han• 2025

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	CommonsenseQA	Accuracy64	136
General Knowledge Question Answering	MMLU	Accuracy62.3	50
Multi-hop Reasoning	StrategyQA	Accuracy62.3	50
Reasoning	AQuA, MMLU, StrategyQA, and CommonSenseQA Average	Accuracy74	16
Mathematical Reasoning	AQUA	--	16
Language Understanding	MMLU	--	4
Language Understanding	MMLU-Pro	--	4

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord