Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

About

Large pre-trained language models for textual data have an unconstrained output space; at each decoding step, they can produce any of 10,000s of sub-word tokens. When fine-tuned to target constrained formal languages like SQL, these models often generate invalid code, rendering it unusable. We propose PICARD (code and trained models available at https://github.com/ElementAI/picard), a method for constraining auto-regressive decoders of language models through incremental parsing. PICARD helps to find valid output sequences by rejecting inadmissible tokens at each decoding step. On the challenging Spider and CoSQL text-to-SQL translation tasks, we show that PICARD transforms fine-tuned T5 models with passable performance into state-of-the-art solutions.

Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau• 2021

Related benchmarks

TaskDatasetResultRank
Text-to-SQLSpider (test)
Execution Accuracy75.1
140
Text-to-SQLSpider (dev)
EX (All)79.3
100
Text-to-SQLSpider 1.0 (dev)
Exact Match Accuracy75.5
92
Text-to-SQLSpider 1.0 (test)
EM Acc (Overall)71.9
91
Text-to-SQLSpider-Realistic
Execution Accuracy (EX)71.4
33
Context-dependent Text-to-SQLCoSQL (dev)
Question Match56.9
22
Context-dependent Text-to-SQLCoSQL (test)
Question Match54.6
12
Semantic ParsingSpider (dev)
Exact Match Accuracy75.5
11
Text-to-SQLSpider hidden (test)
Exact Match (EM)71.9
10
Text-to-SQLSpider-Realistic 1.0 (test)
Exact Match (EM)68.7
9
Showing 10 of 14 rows

Other info

Code

Follow for update