Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Knowledge Distillation for Low-Resource Open-source Text-to-SQL Model

About

Text-to-SQL converts natural language questions into executable SQL queries, enabling non-technical users to access relational databases for analytics and intelligent data services. In real-world scenarios, performance is often constrained by low-resource settings, where high-quality annotated \texttt{<question, SQL>} pairs are scarce, particularly for domain-specific databases. Additional challenges include opaque schema definitions, abbreviations, and implicit business logic that are not explicitly encoded in the schema. Existing data synthesis and prompting techniques improve coverage but often fail to produce task-specific, semantically grounded examples aligned with database constraints. To address these challenges, we propose a knowledge-aware Text-to-SQL framework that constructs task-specific knowledge base including schema semantics, abbreviations, business logic, and query patterns, and injects them into both training and inference. This framework generates diverse, contextually grounded synthetic training data and enhances inference through targeted knowledge retrieval. Experiments on seven benchmarks, covering both general and domain-specific datasets, demonstrate that our approach substantially improves the performance of open-source and closed-source large language models in Text-to-SQL tasks, especially in low-resource domain-specific settings, enhancing generalization, robustness, and adaptability.

Tianhao Qiu, Xiaojun Chen• 2026

Related benchmarks

TaskDatasetResultRank
Text-to-SQLBIRD (dev)
Execution Accuracy (EA)67.8
387
Text-to-SQLSpider (test)
Execution Accuracy88.68
213
Text-to-SQLSpider (dev)
EX82.34
147
Text-to-SQLSpider-DK
Execution Accuracy (EX)80.97
95
Text-to-SQLSpider-Syn
Execution Accuracy (EX)74.83
79
Text-to-SQLEHRSQL
Execution Accuracy55.46
61
Text-to-SQLSpider-Realistic
Execution Accuracy (EX)80.11
39
Text-to-SQLScience Benchmark
Execution Accuracy59.53
28
Text-to-SQL Data SynthesisBIRD Few Columns (train)
Token Cost (1k)1.28e+3
3
Text-to-SQL Data SynthesisBIRD Medium Columns (train)
Token Cost (1k Tokens)2.17e+3
3
Showing 10 of 11 rows

Other info

Follow for update