Gaussian Error Linear Units (GELUs)

About

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

Dan Hendrycks, Kevin Gimpel• 2016

Related benchmarks

Task	Dataset	Result
Graph Classification	PROTEINS	Accuracy76.6	1252
Graph Classification	MUTAG	Accuracy90.9	1103
Language Modeling	WikiText-103 (test)	Perplexity15.82	703
Graph Classification	NCI1	Accuracy83.5	658
Image Classification	Fashion MNIST (test)	Accuracy89.84	633
Image Classification	SVHN (test)	--	470
Graph Classification	NCI109	Accuracy82.9	267
Image Classification	MNIST (test)	--	201
Graph Classification	PTC	Accuracy65.4	167
Language Modeling	WikiText-2 (val)	Perplexity (BVS)28.46	70

Showing 10 of 55 rows

Other info

Follow for update

@wizwand_team Discord