TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning
About
Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such structured data, they often fall short in handling the complex, multi-step reasoning and robust code execution required for real-world table tasks. Reinforcement Learning (RL) offers a promising avenue to enhance these capabilities, yet its application in the tabular domain faces three critical hurdles: the scarcity of high-quality agentic trajectories with closed-loop code execution and environment feedback on diverse table structures, the extreme heterogeneity of feedback signals ranging from rigid SQL execution to open-ended data interpretation, and the risk of catastrophic forgetting of general knowledge during vertical specialization. To overcome these challenges and unlock advanced reasoning on complex tables, we introduce \textbf{TableGPT-R1}, a specialized tabular model built on a systematic RL framework. Our approach integrates a comprehensive data engineering pipeline that synthesizes difficulty-stratified agentic trajectories for both supervised alignment and RL rollouts, a task-adaptive reward system that combines rule-based verification with a criteria-injected reward model and incorporates process-level step reward shaping with behavioral regularization, and a multi-stage training framework that progressively stabilizes reasoning before specializing in table-specific tasks. Extensive evaluations demonstrate that TableGPT-R1 achieves state-of-the-art performance on authoritative benchmarks, significantly outperforming baseline models while retaining robust general capabilities. Our model is available at https://huggingface.co/tablegpt/TableGPT-R1.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-SQL | Spider | -- | 57 | |
| Chart Generation | RealHitBench | ECR55.84 | 49 | |
| Fact Checking | RealHitBench | Exact Match63.85 | 49 | |
| Structure Comprehending | RealHitBench | Exact Match (EM)64.12 | 49 | |
| Data Analysis | RealHitBench | GPT Score66.53 | 49 | |
| Text-to-SQL | Bird | Total Execution Accuracy63.17 | 22 | |
| Numerical Reasoning | RealHitBench | Exact Match (EM)49.03 | 21 | |
| Agent-based Data Analysis | InfiAgent-DABench | Accuracy80.54 | 13 | |
| Data Processing | TableBench | Rge48.35 | 13 | |
| Table Chain of Thought Reasoning | TableBench | Rge48.28 | 13 |