Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Graph Convolutions Enrich the Self-Attention in Transformers!

About

Transformers, renowned for their self-attention mechanism, have achieved state-of-the-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective. We propose a graph-filter-based self-attention (GFSA) to learn a general yet effective one, whose complexity, however, is slightly larger than that of the original self-attention mechanism. We demonstrate that GFSA improves the performance of Transformers in various fields, including computer vision, natural language processing, graph-level tasks, speech recognition, and code classification.

Jeongwhan Choi, Hyowon Wi, Jayoung Kim, Yehjin Shin, Kookjin Lee, Nathaniel Trask, Noseong Park• 2023

Related benchmarks

TaskDatasetResultRank
Language ModelingWikiText-2 (test)
PPL20.923
1541
Language ModelingWikiText-103 (test)
Perplexity15.919
524
Image ClassificationImageNet-1K
Top-1 Acc83
524
Image ClassificationImageNet 1k (test)
Top-1 Accuracy83
359
Natural Language UnderstandingGLUE (val)
SST-295.41
170
Language ModelingPenn Treebank (PTB) (test)
Perplexity19.45
120
Graph ClassificationCIFAR10
Accuracy72.44
108
Graph RegressionZINC
MAE0.069
96
Graph ClassificationMNIST
Accuracy98.26
95
Graph RegressionOGB-LSC PCQM4M v2 (val)
MAE0.086
81
Showing 10 of 25 rows

Other info

Follow for update