TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning

About

Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improved natural language interaction with such structured data, they often fall short in handling the complex, multi-step reasoning and robust code execution required for real-world table tasks. Reinforcement Learning (RL) offers a promising avenue to enhance these capabilities, yet its application in the tabular domain faces three critical hurdles: the scarcity of high-quality agentic trajectories with closed-loop code execution and environment feedback on diverse table structures, the extreme heterogeneity of feedback signals ranging from rigid SQL execution to open-ended data interpretation, and the risk of catastrophic forgetting of general knowledge during vertical specialization. To overcome these challenges and unlock advanced reasoning on complex tables, we introduce \textbf{TableGPT-R1}, a specialized tabular model built on a systematic RL framework. Our approach integrates a comprehensive data engineering pipeline that synthesizes difficulty-stratified agentic trajectories for both supervised alignment and RL rollouts, a task-adaptive reward system that combines rule-based verification with a criteria-injected reward model and incorporates process-level step reward shaping with behavioral regularization, and a multi-stage training framework that progressively stabilizes reasoning before specializing in table-specific tasks. Extensive evaluations demonstrate that TableGPT-R1 achieves state-of-the-art performance on authoritative benchmarks, significantly outperforming baseline models while retaining robust general capabilities. Our model is available at https://huggingface.co/tablegpt/TableGPT-R1.

Saisai Yang, Qingyi Huang, Jing Yuan, Liangyu Zha, Kai Tang, Yuhang Yang, Ning Wang, Yucheng Wei, Liyao Li, Wentao Ye, Hao Chen, Tao Zhang, Junlin Zhou, Haobo Wang, Gang Chen, Junbo Zhao• 2025

Related benchmarks

Task	Dataset	Result
Text-to-SQL	Spider	--	139
Fact Checking	RealHitBench	Exact Match63.85	94
Structure Comprehending	RealHitBench	Exact Match (EM)64.12	94
Text-to-SQL	Bird	Total Execution Accuracy63.17	68
Numerical Reasoning	RealHitBench	Exact Match (EM)49.03	66
Chart Generation	RealHitBench	ECR55.84	60
Data Analysis	RealHitBench	GPT Score66.53	60
Agent-based Data Analysis	InfiAgent-DABench	Accuracy80.54	13
Data Processing	TableBench	Rge48.35	13
Table Chain of Thought Reasoning	TableBench	Rge48.28	13

Showing 10 of 14 rows

Other info

Follow for update

@wizwand_team Discord