Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Gaussian Error Linear Units (GELUs)

About

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

Dan Hendrycks, Kevin Gimpel• 2016

Related benchmarks

TaskDatasetResultRank
Graph ClassificationPROTEINS
Accuracy76.6
1252
Graph ClassificationMUTAG
Accuracy90.9
1103
Language ModelingWikiText-103 (test)
Perplexity15.82
703
Graph ClassificationNCI1
Accuracy83.5
658
Image ClassificationFashion MNIST (test)
Accuracy89.84
633
Image ClassificationSVHN (test)--
470
Graph ClassificationNCI109
Accuracy82.9
267
Image ClassificationMNIST (test)--
201
Graph ClassificationPTC
Accuracy65.4
167
Language ModelingWikiText-2 (val)
Perplexity (BVS)28.46
70
Showing 10 of 55 rows

Other info

Follow for update