SQL-Trail: Multi-Turn Reinforcement Learning with Interleaved Feedback for Text-to-SQL

About

While large language models (LLMs) have substantially improved Text-to-SQL generation, a pronounced gap remains between AI systems and human experts on challenging benchmarks such as BIRD-SQL. We argue this gap stems largely from the prevailing single-pass paradigm, which lacks the iterative reasoning, schema exploration, and error-correction behaviors that humans naturally employ. To address this limitation, we introduce SQL-Trail, a multi-turn reinforcement learning (RL) agentic framework for Text-to-SQL. Rather than producing a query in one shot, SQL-Trail interacts with the database environment and uses execution feedback to iteratively refine its predictions. Our approach centers on two key ideas: (i) an adaptive turn-budget allocation mechanism that scales the agent's interaction depth to match question difficulty, and (ii) a composite reward panel that jointly incentivizes SQL correctness and efficient exploration. Across benchmarks, SQL-Trail sets a new state of the art and delivers strong data efficiency--up to 18x higher than prior single-pass RL state-of-the-art methods. Notably, our 7B and 14B models outperform substantially larger proprietary systems by 5% on average, underscoring the effectiveness of interactive, agentic workflows for robust Text-to-SQL generation.

Harper Hua, Zhen Han, Zhengyuan Shen, Jeremy Lee, Patrick Guan, Qi Zhu, Sullam Jeoung, Yueyan Chen, Yunfei Bai, Shuai Wang, Vassilis Ioannidis, Huzefa Rangwala• 2026

Related benchmarks

Task	Dataset	Result
Text-to-SQL	Spider (test)	--	213
Text-to-SQL	Spider (dev)	--	147
Text-to-SQL	Spider-DK	--	95
Text-to-SQL	Spider-Syn	--	79
Text-to-SQL	Spider-Realistic	--	47
Text2SQL	BIRD (dev)	Exec Acc (Greedy)63.6	44
Text2SQL	Spider (test)	Exec Acc (Greedy)86.8	37

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord