Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Language Modeling on Pretraining Dataset
Loading...
3.133
Train Loss (PT)
BHyT
3.10324
3.30412
3.505
3.70588
Dec 26, 2025
Train Loss (PT)
Eval Loss (PT)
Eval Perplexity (PT)
Updated 4d ago
Evaluation Results
Method
Method
Links
Train Loss (PT)
Eval Loss (PT)
Eval Perplexity (PT)
BHyT
Backbone=Llama-3B
2025.12
3.133
3.107
22.346
LNS
Backbone=Llama-3B
2025.12
3.16
3.139
23.091
Peri-LN
Backbone=Llama-3B
2025.12
3.165
3.142
23.156
RMSNorm
Backbone=Llama-3B
2025.12
3.203
3.18
24.04
BHyT
Model=Llama-1B, Evalua...
2025.12
3.268
3.254
25.908
LNS
Model=Llama-1B, Evalua...
2025.12
3.28
3.271
26.342
RMSNorm
Model=Llama-1B, Evalua...
2025.12
3.281
3.272
26.353
Peri-LN
Model=Llama-1B, Evalua...
2025.12
3.288
3.279
26.545
DyT
Model=Llama-1B, Evalua...
2025.12
3.709
3.696
40.294
DyT
Backbone=Llama-3B
2025.12
3.877
3.855
47.244
Feedback
Search any
task
Search any
task