PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

About

Large pre-trained language models for textual data have an unconstrained output space; at each decoding step, they can produce any of 10,000s of sub-word tokens. When fine-tuned to target constrained formal languages like SQL, these models often generate invalid code, rendering it unusable. We propose PICARD (code and trained models available at https://github.com/ElementAI/picard), a method for constraining auto-regressive decoders of language models through incremental parsing. PICARD helps to find valid output sequences by rejecting inadmissible tokens at each decoding step. On the challenging Spider and CoSQL text-to-SQL translation tasks, we show that PICARD transforms fine-tuned T5 models with passable performance into state-of-the-art solutions.

Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau• 2021

Related benchmarks

Task	Dataset	Result
Text-to-SQL	Spider (test)	Execution Accuracy75.1	213
Text-to-SQL	Spider (dev)	EX79.3	147
Text-to-SQL	Spider 1.0 (test)	EM Acc (Overall)71.9	110
Text-to-SQL	Spider 1.0 (dev)	Exact Match Accuracy75.5	92
Text-to-SQL	Spider-Realistic	Execution Accuracy (EX)71.4	47
Context-dependent Text-to-SQL	CoSQL (dev)	Question Match56.9	33
Context-dependent Text-to-SQL	CoSQL (test)	Question Match54.6	12
Semantic Parsing	Spider (dev)	Exact Match Accuracy75.5	11
Text-to-SQL	Spider hidden (test)	Exact Match (EM)71.9	10
Text-to-SQL	Spider-Realistic 1.0 (test)	Exact Match (EM)68.7	9

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord