Enhancing Table Reasoning with Deterministic Table-State Rewards
About
Large Language Models (LLMs) struggle with multi-step reasoning over structured tables. The primary reason is the lack of explicit supervision for intermediate reasoning states. Existing learned reward models or executor-based verifiers are either unscalable or rely on answer-checking environments unavailable for many tabular tasks. This leaves no signal that is scalable and grounded in the query. To address this, we introduce TABROUGE, a training-free and deterministic state reward. By adapting the Longest Common Subsequence (LCS) metric from text summarization to evaluate tabular states, TABROUGE assesses the lexical coverage and structural integrity of intermediate tables against the query without requiring learned models or external executors. Built upon this metric, we propose RE-TAB, a plug-and-play, training-free framework. RE-TAB reframes table reasoning as deterministic control over intermediate states, utilizing TABROUGE for stepwise feedback and trajectory-level test-time scaling (TTS) signals. Across six backbones and three benchmarks, RE-TAB improves accuracy by an average of 26.7 pp over no-reward baselines. It also reduces TTS samples by up to 33%. Preliminary GRPO experiments further indicate TABROUGE's viability as a scalable post-training reward, increasing gains by 8.34 pp. We further analyze failure modes of TABROUGE, including paraphrase under-rewarding and echo-column hacking, and identify when structure-aware lexical rewards remain reliable.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Table Question Answering | WikiTQ | Accuracy91.84 | 149 | |
| Table Question Answering | MMQA | Accuracy86.08 | 10 | |
| Table Question Answering | MMTU | Accuracy90.22 | 10 | |
| Table Question Answering | WikiTQ | BLEU63.19 | 5 | |
| Table Question Answering | MMTU | BLEU21.19 | 3 |