Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Linformer: Self-Attention with Linear Complexity

About

Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the Transformer uses $O(n^2)$ time and space with respect to sequence length. In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to propose a new self-attention mechanism, which reduces the overall self-attention complexity from $O(n^2)$ to $O(n)$ in both time and space. The resulting linear transformer, the \textit{Linformer}, performs on par with standard Transformer models, while being much more memory- and time-efficient.

Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma• 2020

Related benchmarks

TaskDatasetResultRank
Image ClassificationCIFAR-100 (test)
Accuracy70.87
3518
Image ClassificationCIFAR-10 (test)
Accuracy92.45
3381
Image ClassificationImageNet-1K 1.0 (val)
Top-1 Accuracy78.7
1866
Image ClassificationImageNet (val)
Top-1 Acc77.6
1206
Language ModelingPTB
Perplexity48.9
650
Language ModelingWikiText-103 (test)
Perplexity26.1
524
Natural Language UnderstandingGLUE (dev)
SST-2 (Acc)93.1
504
Language ModelingPTB (test)
Perplexity48.9
471
Image ClassificationCIFAR-10--
471
Long-range sequence modelingLong Range Arena (LRA)
Text Accuracy57.29
164
Showing 10 of 57 rows

Other info

Follow for update