Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Causal Language Modeling on WikiText-103 (val)
Loading...
72.57
Validation Perplexity
GEM (N = 2)
72.1716
74.8608
77.55
80.2392
Apr 23, 2026
Validation Perplexity
Validation Loss
GELU Delta
Updated 1mo ago
Evaluation Results
Method
Method
Links
Validation Perplexity
Validation Loss
GELU Delta
GEM (N = 2)
Activation=GEM (N = 2)...
2026.04
72.57
4.2847
1.19
GEM (N = 1)
Activation=GEM (N = 1)...
2026.04
73.32
4.2948
0.44
GELU
Activation=GELU, Throu...
2026.04
73.76
4.3008
-
GELU (tanh)
Activation=GELU (tanh)...
2026.04
73.76
4.3008
0
ReLU
Activation=ReLU, Throu...
2026.04
77.64
4.3521
3.88
SiLU/Swish
Activation=SiLU/Swish,...
2026.04
82.53
4.4131
8.77
Feedback
Search any
task
Search any
task