Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Masked Language Modeling on WikiText-103 (train)
Loading...
6.6188
Training Loss
GEM (N = 1)
6.617808
6.624504
6.6312
6.637896
Apr 23, 2026
Training Loss
Updated 1mo ago
Evaluation Results
Method
Method
Links
Training Loss
GEM (N = 1)
Architecture=BERT-smal...
2026.04
6.6188
GEM (N = 2)
Architecture=BERT-smal...
2026.04
6.6267
GELU (tanh)
Architecture=BERT-smal...
2026.04
6.628
GELU
Architecture=BERT-smal...
2026.04
6.6341
ReLU
Architecture=BERT-smal...
2026.04
6.6413
SiLU/Swish
Architecture=BERT-smal...
2026.04
6.6436
Feedback
Search any
task
Search any
task