Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

About

Multiple-choice questions (MCQs) are widely used to evaluate large language models (LLMs). However, LLMs remain vulnerable to the presence of plausible distractors. This often diverts attention toward irrelevant choices, resulting in unstable oscillation between correct and incorrect answers. In this paper, we propose Inclusion-of-Thoughts (IoT), a progressive self-filtering strategy that is designed to mitigate this cognitive load (i.e., instability of model preferences under the presence of distractors) and enable the model to focus more effectively on plausible answers. Our method operates to reconstruct the MCQ using only plausible option choices, providing a controlled setting for examining comparative judgements and therefore the stability of the model's internal reasoning under perturbation. By explicitly documenting this filtering process, IoT also enhances the transparency and interpretability of the model's decision-making. Extensive empirical evaluation demonstrates that IoT substantially boosts chain-of-thought performance across a range of arithmetic, commonsense reasoning, and educational benchmarks with minimal computational overhead.

Mohammad Reza Ghasemi Madani, Soyeon Caren Han, Shuo Yang, Jey Han Lau• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningAQUA
Accuracy79.13
146
Commonsense Question AnsweringCSQA
Accuracy84.54
58
Reasoning Question AnsweringARC
Accuracy94.62
21
General Knowledge Question AnsweringMMLU
Accuracy82.06
18
Commonsense Question AnsweringOBQA
Accuracy93.4
14
General Reasoning AverageAggregate OBQA, CSQA, SIQA, ARC, MMLU, GSM8K-MC, AQUA
Average Accuracy86.21
14
Social Commonsense Question AnsweringSIQA
Accuracy80.04
14
Mathematical ReasoningGSM8K MC
Accuracy95
14
Showing 8 of 8 rows

Other info

Follow for update