DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models

About

Leading models for the text-to-SQL task heavily rely on proprietary Large Language Models (LLMs), posing concerns over data privacy. Closing the performance gap between small open-source models and large proprietary models is crucial to mitigate this reliance. To this end, we introduce a novel two-stage fine-tuning approach that decomposes the task into two simpler tasks. Through comprehensive evaluation on two large cross-domain datasets and two small LLMs, we show that this approach improves execution accuracy by 3 to 7 percent, effectively aligning the performance of open-source models with their proprietary counterparts.

Mohammadreza Pourreza, Davood Rafiei• 2024

Related benchmarks

Task	Dataset	Result
Text-to-SQL	BIRD (dev)	Execution Accuracy (EA)61.56	477
Text-to-SQL	Spider (test)	Execution Accuracy84.4	256
Text-to-SQL	Spider (dev)	EX85.5	196
Text-to-SQL	Spider	Exec Acc (All)85.09	139
Text-to-SQL	Spider 1.0 (test)	EM Acc (Overall)77	110
Text-to-SQL	Bird	Execution Accuracy (EX)42.18	83
Text-to-SQL	LogicCat	Exact Match14.88	58
Text-to-SQL	Archer (dev)	Execution Accuracy33.17	45
Text-to-SQL	Spider	Execution Accuracy (EX)73.5	38

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord