Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Evaluating the Text-to-SQL Capabilities of Large Language Models

About

We perform an empirical evaluation of Text-to-SQL capabilities of the Codex language model. We find that, without any finetuning, Codex is a strong baseline on the Spider benchmark; we also analyze the failure modes of Codex in this setting. Furthermore, we demonstrate on the GeoQuery and Scholar benchmarks that a small number of in-domain examples provided in the prompt enables Codex to perform better than state-of-the-art models finetuned on such few-shot examples.

Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau• 2022

Related benchmarks

TaskDatasetResultRank
Table Question AnsweringWikiTQ
Accuracy52.9
149
Text-to-SQLSpider (dev)--
147
Table Fact VerificationTabFact (test)
Accuracy69.7
146
Table Question AnsweringWikiTQ (test)
Accuracy61.1
140
Table Question AnsweringWikiTableQuestions (test)
Accuracy52.9
86
Fact VerificationTabFact
Accuracy68.37
83
Table-based Fact VerificationTabFact
Accuracy64.71
49
Table Question AnsweringSTQA-N
Accuracy62.6
20
Table Question AnsweringSTQA L
Accuracy47.1
20
Showing 9 of 9 rows

Other info

Follow for update