Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

About

The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models. In this work, the transformer's inefficiency has been taken care of from another perspective. We propose Fourier Transformer, a simple yet effective approach by progressively removing redundancies in hidden sequence using the ready-made Fast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation (DCT). Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models. Experiments show that our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA with significant improvement in both speed and space. For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART and other efficient models. Our code is publicly available at https://github.com/LUMIA-Group/FourierTransformer

Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin• 2023

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVizWiz
Accuracy56.1
1525
Visual Question AnsweringVQA v2
Accuracy70.8
1362
Visual Question AnsweringTextVQA
Accuracy26.3
1285
Visual Question AnsweringGQA
Accuracy57.3
1249
Multimodal EvaluationMME
Score1.36e+3
658
Text ClassificationSST-2
Accuracy90.7
125
Text ClassificationIMDB
Accuracy92.4
112
Long sequence classificationLRA (Long Range Arena) (test)
Average Accuracy67.54
92
Text SummarizationCNN/Daily Mail (test)
ROUGE-221.55
65
Visual Question AnsweringScienceQA image
Accuracy70.3
51
Showing 10 of 14 rows

Other info

Code

Follow for update