Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Star-Transformer

About

Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang• 2019

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceSNLI (test)
Accuracy86
681
Named Entity RecognitionCoNLL 2003 (test)
F1 Score91.98
539
Text ClassificationPubmed
micro-F182.35
50
POS TaggingPTB (test)
Accuracy97.68
24
Text ClassificationReuters
Micro-F180.22
22
Text ClassificationAAPD
Micro-F168.22
17
Text ClassificationSemEval
Micro-F151.42
17
Text ClassificationCAVES
Micro-F153.86
17
Text ClassificationSST-1 (test)
Accuracy52.9
16
Part-of-Speech TaggingWall Street Journal (WSJ) section 23 (test)
Accuracy97.04
12
Showing 10 of 13 rows

Other info

Code

Follow for update