Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Causal Language Modeling on WikiText-103 GPT-2 (124M) (train)
Loading...
4.4614
Train Loss
GEM (N = 2)
4.455988
4.492519
4.52905
4.565581
Apr 23, 2026
Train Loss
Updated 1mo ago
Evaluation Results
Method
Method
Links
Train Loss
GEM (N = 2)
Activation=GEM (N = 2)...
2026.04
4.4614
GEM (N = 1)
Activation=GEM (N = 1)...
2026.04
4.4727
GELU
Activation=GELU, Throu...
2026.04
4.4833
GELU (tanh)
Activation=GELU (tanh)...
2026.04
4.4833
ReLU
Activation=ReLU, Throu...
2026.04
4.5341
SiLU/Swish
Activation=SiLU/Swish,...
2026.04
4.5967
Feedback
Search any
task
Search any
task