ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios
About
Recent advancements in Large Language Models (LLMs) have significantly catalyzed table-based question answering (TableQA). However, existing TableQA benchmarks often overlook the intricacies of industrial scenarios, which are characterized by multi-table structures, nested headers, and massive scales. These environments demand robust table reasoning through deep structured inference, presenting a significant challenge that remains inadequately addressed by current methodologies. To bridge this gap, we present ReasonTabQA, a large-scale bilingual benchmark encompassing 1,932 tables across 30 industry domains such as energy and automotive. ReasonTabQA provides high-quality annotations for both final answers and explicit reasoning chains, supporting both thinking and no-thinking paradigms. Furthermore, we introduce TabCodeRL, a reinforcement learning method that leverages table-aware verifiable rewards to guide the generation of logical reasoning paths. Extensive experiments on ReasonTabQA and 4 TableQA datasets demonstrate that while TabCodeRL yields substantial performance gains on open-source LLMs, the persistent performance gap on ReasonTabQA underscores the inherent complexity of real-world industrial TableQA.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Table Question Answering | WTQ | Accuracy83.07 | 101 | |
| Table Question Answering | HiTab | Accuracy80.72 | 67 | |
| Table Question Answering | AIT-QA | Accuracy71.06 | 41 | |
| Tabular Question Answering | ReasonTabQA 1.0 (Overall) | Overall Accuracy61.89 | 33 | |
| Table Question Answering | ReasonTabQA 1.0 (test) | Accuracy61.89 | 17 | |
| Table Question Answering | MimoTable | Accuracy70.33 | 17 |