Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning

About

Language models such as GPT and Llama have shown remarkable ability on diverse natural language tasks, yet their performance on complex table tasks (e.g., NL-to-Code and data cleaning) remains suboptimal. Improving performance typically requires task-specific fine-tuning, which depends on expensive human labeling and is prone to overfitting. In this work, we propose Table-LLM-Specialist, a self-trained fine-tuning paradigm designed for table tasks. Our key insight is that many table tasks admit two dual formulations: a generative version and a classification version. Leveraging this duality, we introduce a Generator-Validator paradigm that iteratively generates and validates training data using language models, enabling effective fine-tuning without manually labeled data. Extensive evaluations on Llama, GPT-3.5, and GPT-4 show that Table-LLM-Specialist achieves (1) strong performance across diverse tasks compared to base models, for example, models fine-tuned on GPT-3.5 often surpass GPT-4 level quality; (2) lower deployment cost by enabling smaller models to reach high quality with reduced latency and cost; and (3) better generalization across multiple benchmarks, due to training on diverse, systematically generated data from real-world tables. Our code is available at https://github.com/microsoft/Table-Specialist. Models fine-tuned with Table-LLM-Specialist have been integrated into Microsoft Excel and are deployed in production for automated table data cleaning.

Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang, Surajit Chaudhuri• 2024

Related benchmarks

Task	Dataset	Result
Text-to-SQL	Spider	Exec Acc (All)70.4	139
Text-to-SQL	Bird	Total Execution Accuracy55.6	68
Text-to-SQL	Bird	Accuracy47.5	27
NL-to-SQL	WikiTQ	Execution Accuracy59.7	22
NL-to-SQL	WikiSQL	Execution Accuracy87.4	18
NL-to-SQL	Text2Analysis	Execution Accuracy57.2	12
Data-transformation (Pandas)	TDE	Execution Accuracy45.6	6
Data-transformation (R)	TDE	Execution Accuracy31.8	6
Data-transformation (SQL)	TDE	Execution Accuracy20.2	6
Data-transformation (SQL)	Transform-Text	Execution Accuracy22.7	6

Showing 10 of 31 rows

Other info

Follow for update

@wizwand_team Discord