Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Masked Language Modeling on WikiText-103 (val)
Loading...
6.6764
Validation Loss
GEM (N = 1)
6.675608
6.680954
6.6863
6.691646
Apr 23, 2026
Validation Loss
Val Loss Uncertainty
GELU Delta
Updated 1mo ago
Evaluation Results
Method
Method
Links
Validation Loss
Val Loss Uncertainty
GELU Delta
GEM (N = 1)
Architecture=BERT-smal...
2026.04
6.6764
0.034
0.012
GELU (tanh)
Architecture=BERT-smal...
2026.04
6.6815
0.03
0.007
GEM (N = 2)
Architecture=BERT-smal...
2026.04
6.6819
0.036
0.006
GELU
Architecture=BERT-smal...
2026.04
6.688
0.04
-
ReLU
Architecture=BERT-smal...
2026.04
6.6899
0.045
0.002
SiLU/Swish
Architecture=BERT-smal...
2026.04
6.6962
0.026
0.008
Feedback
Search any
task
Search any
task