Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks

About

Variants dropout methods have been designed for the fully-connected layer, convolutional layer and recurrent layer in neural networks, and shown to be effective to avoid overfitting. As an appealing alternative to recurrent and convolutional layers, the fully-connected self-attention layer surprisingly lacks a specific dropout method. This paper explores the possibility of regularizing the attention weights in Transformers to prevent different contextualized feature vectors from co-adaption. Experiments on a wide range of tasks show that DropAttention can improve performance and reduce overfitting.

Lin Zehui, Pengfei Liu, Luyao Huang, Junkun Chen, Xipeng Qiu, Xuanjing Huang• 2019

Related benchmarks

TaskDatasetResultRank
Natural Language UnderstandingGLUE (test dev)
MRPC Accuracy90.2
81
Showing 1 of 1 rows

Other info

Follow for update