Game of Thought: Robust Information Seeking with Large Language Models Using Game Theory

About

Large Language Models (LLMs) are increasingly deployed in real-world scenarios where they may lack sufficient information to complete a given task. In such settings, the ability to actively seek out missing information becomes a critical capability. Existing approaches to enhancing this ability often rely on simplifying assumptions that degrade \textit{worst-case} performance. This is an issue with serious implications in high-stakes applications. In this work, we use the game of Twenty Questions to evaluate the information-seeking ability of LLMs. We introduce and formalize its adversarial counterpart, the Strategic Language Search (SLS) problem along with its variants as a two-player zero-sum extensive form game. We propose Game of Thought (GoT), a framework that applies game-theoretic techniques to approximate a Nash equilibrium (NE) strategy for the restricted variant of the game. Empirical results demonstrate that our approach consistently improves worst-case performance compared to (1) direct prompting-based methods and (2) heuristic-guided search methods across all tested settings.

Langyuan Cui, Chun Kai Ling, Hwee Tou Ng• 2026

Related benchmarks

Task	Dataset	Result
20 Questions	20Q Common	Worst Case Interaction Length10	8
20 Questions	20Q S128	Worst Case Interaction Length10.8	8
20 Questions	20Q Breeds	Worst Case Interaction Length6.6	8
Medical Diagnosis	MD DX	Worst Case Interaction Length10.5	8
Troubleshooting	TS FloDial	Worst Case Interaction Length7.5	8
Information Seeking	20Q Breeds weighted (test)	Worst-case Weighted Payoff32.3	8
Information Seeking	20Q Common weighted (test)	Worst-case Weighted Payoff152.1	8
Medical Diagnosis	MD DX weighted (test)	Worst-case Weighted Payoff78.3	8
Troubleshooting	TS FloDial weighted (test)	Worst-case Weighted Payoff62.3	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord