Enhancing Table Reasoning with Deterministic Table-State Rewards

About

Large Language Models (LLMs) struggle with multi-step reasoning over structured tables. The primary reason is the lack of explicit supervision for intermediate reasoning states. Existing learned reward models or executor-based verifiers are either unscalable or rely on answer-checking environments unavailable for many tabular tasks. This leaves no signal that is scalable and grounded in the query. To address this, we introduce TABROUGE, a training-free and deterministic state reward. By adapting the Longest Common Subsequence (LCS) metric from text summarization to evaluate tabular states, TABROUGE assesses the lexical coverage and structural integrity of intermediate tables against the query without requiring learned models or external executors. Built upon this metric, we propose RE-TAB, a plug-and-play, training-free framework. RE-TAB reframes table reasoning as deterministic control over intermediate states, utilizing TABROUGE for stepwise feedback and trajectory-level test-time scaling (TTS) signals. Across six backbones and three benchmarks, RE-TAB improves accuracy by an average of 26.7 pp over no-reward baselines. It also reduces TTS samples by up to 33%. Preliminary GRPO experiments further indicate TABROUGE's viability as a scalable post-training reward, increasing gains by 8.34 pp. We further analyze failure modes of TABROUGE, including paraphrase under-rewarding and echo-column hacking, and identify when structure-aware lexical rewards remain reliable.

Tung Sum Thomas Kwok, Xinyu Wang, Hengzhi He, Xiaofeng Lin, Peng Lu, Liheng Ma, Chunhe Wang, Chun Ho Mak, Yuyu Luo, Ying Nian Wu, Lei Ding, Guang Cheng• 2026

Related benchmarks

Task	Dataset	Result
Table Question Answering	WikiTQ	Accuracy91.84	149
Table Question Answering	MMQA	Accuracy86.08	10
Table Question Answering	MMTU	Accuracy90.22	10
Table Question Answering	WikiTQ	BLEU63.19	5
Table Question Answering	MMTU	BLEU21.19	3

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord