Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Gaussian Error Linear Units (GELUs)

About

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ($x\mathbf{1}_{x>0}$). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

Dan Hendrycks, Kevin Gimpel• 2016

Related benchmarks

TaskDatasetResultRank
Graph ClassificationPROTEINS
Accuracy76.6
742
Graph ClassificationMUTAG
Accuracy90.9
697
Graph ClassificationNCI1
Accuracy83.5
460
Graph ClassificationNCI109
Accuracy82.9
223
Graph ClassificationPTC
Accuracy65.4
167
Graph ClassificationMOLTOX21
ROC-AUC0.7429
38
Graph ClassificationMOLBACE
ROC AUC0.7559
31
Regressionmolesol OGB
RMSE1.147
26
Language ModelingC4 T5 (val)
PPLX19.58
20
Graph ClassificationOGB-MOLHIV
ROC-AUC0.7415
15
Showing 10 of 14 rows

Other info

Follow for update