Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sparse Sequence-to-Sequence Models

About

Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-sequence models, rooted in a new family of $\alpha$-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any $\alpha > 1$. We provide fast algorithms to evaluate these transformations and their gradients, which scale well for large vocabulary sizes. Our models are able to produce sparse alignments and to assign nonzero probability to a short list of plausible outputs, sometimes rendering beam search exact. Experiments on morphological inflection and machine translation reveal consistent gains over dense models.

Ben Peters, Vlad Niculae, Andr\'e F.T. Martins• 2019

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
Accuracy32.8
1891
Commonsense ReasoningWinoGrande
Accuracy50.9
1085
Commonsense ReasoningPIQA
Accuracy63.6
751
Machine TranslationWMT En-De 2014 (test)
BLEU25.89
379
Language ModelingLAMBADA
Accuracy32.1
268
Machine TranslationWMT Ro-En 2016 (test)
BLEU33.1
84
Language ModelingArxiv Proof-pile
Perplexity16.85
40
Language ModelingPubmed
Perplexity17.64
38
MQMTARMQMTAR OOD lengths 2x 4x 16x 64x 256x 1024x
Exact Match Accuracy100
30
CopyCopy OOD lengths: 2x, 4x, 8x, 16x, 32x, 64x
Exact Match Accuracy99
30
Showing 10 of 24 rows

Other info

Follow for update