Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Modeling on FineWeb-EDU (train)
Loading...
2.993
Loss
sHC
2.9802
3.0666
3.153
3.2394
Mar 21, 2026
Loss
Train PPL
Time per Step (s)
GPU Memory (GB)
Updated 26d ago
Evaluation Results
Method
Method
Links
Loss
Train PPL
Time per Step (s)
GPU Memory (GB)
sHC
Model Scale=L
2026.03
2.993
-
-
-
mHC-lite
Model Scale=L
2026.03
3.013
-
-
-
mHC
Model Scale=L
2026.03
3.017
-
-
-
RC
Model Scale=L
2026.03
3.058
-
-
-
HC
Model Scale=L
2026.03
3.112
-
-
-
sHC
Model Scale=M
2026.03
3.23
-
-
-
mHC
Model Scale=M
2026.03
3.241
-
-
-
mHC-lite
Model Scale=M
2026.03
3.241
-
-
-
HC
Model Scale=M
2026.03
3.276
-
-
-
RC
Model Scale=M
2026.03
3.313
-
-
-
AdamW
Model architecture=Lla...
2026.03
-
10.83
25.12
37.62
Muon
Model architecture=Lla...
2026.03
-
10.01
25.39
32.51
NuMuon
Model architecture=Lla...
2026.03
-
10.59
30.03
33.88
Feedback
Search any
task
Search any
task