Structure Enables Effective Self-Localization of Errors in LLMs

About

Self-correction in language models remains elusive. In this work, we explore whether language models can explicitly localize errors in incorrect reasoning, as a path toward building AI systems that can effectively correct themselves. We introduce a prompting method that structures reasoning as discrete, semantically coherent thought steps, and show that models can localize errors more reliably within this structure than in conventional, unstructured chain-of-thought reasoning. Motivated by how the human brain monitors errors at discrete decision points and resamples alternatives, we introduce Iterative Correction Sampling of Thoughts (Thought-ICS), a self-correction framework. Thought-ICS iteratively prompts the model to generate reasoning one discrete and complete thought at a time--where each thought represents a deliberate decision by the model--creating natural boundaries for precise error localization. Upon verification, the model localizes the first erroneous step, and the system backtracks to generate alternative reasoning from the last correct point. When asked to correct reasoning verified as incorrect by an oracle, Thought-ICS achieves 20-40% self-correction lift. In a completely autonomous setting without external verification, it outperforms contemporary self-correction baselines.

Ankur Samanta, Akshayaa Magesh, Ayush Jain, Kavosh Asadi, Youliang Yu, Daniel Jiang, Boris Vidolov, Kaveh Hassani, Paul Sajda, Jalaj Bhandari, Yonathan Efroni• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	CSQA	Accuracy96	366
Mathematical Reasoning	MathQA	Accuracy90	354
Mathematical Reasoning	AIME	AIME Accuracy72	288
Question Answering	GPQA	Accuracy69	258
Science Reasoning	GPQA	Accuracy79	243
Mathematical Reasoning	AMC 23	Accuracy92.5	198
Mathematical Reasoning	MATH L5	Accuracy0.75	90

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord