MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
About
As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves \textbf{64.4}\% accuracy in the BIRD Dev and 84.6% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-SQL | BIRD (dev) | Execution Accuracy (EA)68.1 | 387 | |
| Text-to-SQL | Spider (test) | -- | 213 | |
| Text-to-SQL | Spider (dev) | EX86.7 | 147 | |
| Text-to-SQL | Spider-DK | Execution Accuracy (EX)76.3 | 95 | |
| Text-to-SQL | Spider-Syn | Execution Accuracy (EX)81 | 79 | |
| Text-to-SQL | Spider-Realistic | Execution Accuracy (EX)81.1 | 47 | |
| Text2SQL | BIRD (dev) | Exec Acc (Greedy)63.1 | 44 | |
| Text-to-SQL | Bird | Accuracy63.1 | 27 |