Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Modeling on Pre-training corpus
Loading...
1.577
Loss
Muon
1.52792
1.85921
2.1905
2.52179
Oct 21, 2025
Nov 18, 2025
Dec 17, 2025
Jan 14, 2026
Feb 12, 2026
Mar 12, 2026
Apr 10, 2026
Loss
Updated 6d ago
Evaluation Results
Method
Method
Links
Loss
Muon
Optimizer=Muon, Model...
2026.04
1.577
Adam+Nexus
Optimizer=Adam+Nexus,...
2026.04
1.602
AdamW
Optimizer=AdamW, Model...
2026.04
1.606
Panda-3B
d_model=4096, f_size=4...
2025.10
2.619
Surefire-3B
d_model=4096, f_size=4...
2025.10
2.62
LLaMA-3.2-3B
d_model=3072, f_size=8...
2025.10
2.625
Panda-1B
d_model=2560, f_size=4...
2025.10
2.782
LLaMA-3.2-1B
d_model=2048, f_size=8...
2025.10
2.803
Surefire-1B
d_model=2560, f_size=6...
2025.10
2.804
Feedback
Search any
task
Search any
task