Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning
About
Language models such as GPT and Llama have shown remarkable ability on diverse natural language tasks, yet their performance on complex table tasks (e.g., NL-to-Code and data cleaning) remains suboptimal. Improving performance typically requires task-specific fine-tuning, which depends on expensive human labeling and is prone to overfitting. In this work, we propose Table-LLM-Specialist, a self-trained fine-tuning paradigm designed for table tasks. Our key insight is that many table tasks admit two dual formulations: a generative version and a classification version. Leveraging this duality, we introduce a Generator-Validator paradigm that iteratively generates and validates training data using language models, enabling effective fine-tuning without manually labeled data. Extensive evaluations on Llama, GPT-3.5, and GPT-4 show that Table-LLM-Specialist achieves (1) strong performance across diverse tasks compared to base models, for example, models fine-tuned on GPT-3.5 often surpass GPT-4 level quality; (2) lower deployment cost by enabling smaller models to reach high quality with reduced latency and cost; and (3) better generalization across multiple benchmarks, due to training on diverse, systematically generated data from real-world tables. Our code is available at https://github.com/microsoft/Table-Specialist. Models fine-tuned with Table-LLM-Specialist have been integrated into Microsoft Excel and are deployed in production for automated table data cleaning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-SQL | Spider | Exec Acc (All)70.4 | 91 | |
| Text-to-SQL | Bird | Total Execution Accuracy55.6 | 64 | |
| Text-to-SQL | Bird | Accuracy47.5 | 27 | |
| NL-to-SQL | WikiSQL | Execution Accuracy87.4 | 18 | |
| NL-to-SQL | WikiTQ | Execution Accuracy59.7 | 12 | |
| NL-to-SQL | Text2Analysis | Execution Accuracy57.2 | 12 | |
| Data-transformation (Pandas) | TDE | Execution Accuracy45.6 | 6 | |
| Data-transformation (R) | TDE | Execution Accuracy31.8 | 6 | |
| Data-transformation (SQL) | TDE | Execution Accuracy20.2 | 6 | |
| Data-transformation (SQL) | Transform-Text | Execution Accuracy22.7 | 6 |