XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL

About

To leverage the advantages of LLM in addressing challenges in the Text-to-SQL task, we present XiYan-SQL, an innovative framework effectively generating and utilizing multiple SQL candidates. It consists of three components: 1) a Schema Filter module filtering and obtaining multiple relevant schemas; 2) a multi-generator ensemble approach generating multiple highquality and diverse SQL queries; 3) a selection model with a candidate reorganization strategy implemented to obtain the optimal SQL query. Specifically, for the multi-generator ensemble, we employ a multi-task fine-tuning strategy to enhance the capabilities of SQL generation models for the intrinsic alignment between SQL and text, and construct multiple generation models with distinct generation styles by fine-tuning across different SQL formats. The experimental results and comprehensive analysis demonstrate the effectiveness and robustness of our framework. Overall, XiYan-SQL achieves a new SOTA performance of 75.63% on the notable BIRD benchmark, surpassing all previous methods. It also attains SOTA performance on the Spider test set with an accuracy of 89.65%.

Yifu Liu, Yin Zhu, Yingqi Gao, Zhiling Luo, Xiaoxia Li, Xiaorong Shi, Yuntao Hong, Jinyang Gao, Yu Li, Bolin Ding, Jingren Zhou• 2025

Related benchmarks

Task	Dataset	Result
Text-to-SQL	BIRD (dev)	Execution Accuracy (EA)73.34	387
Text-to-SQL	Spider (test)	Execution Accuracy89.65	213
Text-to-SQL	Spider (dev)	EX63.96	147
Text-to-SQL	Spider	Exec Acc (All)72.73	139
Text-to-SQL	Spider 1.0 (test)	EM Acc (Overall)89.65	110
Text-to-SQL	Bird	Execution Accuracy (EX)57.6	63
Text-to-SQL	BIRD (test)	EX75.63	46
Text-to-SQL	BIRD-SQL Mini (dev)	Execution Accuracy (EX)52.2	21
Text-to-SQL	Bird	Execution Accuracy57.6	20
Text-to-SQL	Mini (dev)	Execution Accuracy (EX)52.2	9

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord