Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Modeling on FineWeb-Edu 10K steps (train)
Loading...
3.338
Training Loss
Delta Block
3.327
3.40125
3.4755
3.54975
May 13, 2026
Training Loss
Updated 14d ago
Evaluation Results
Method
Method
Links
Training Loss
Delta Block
Scale=1044M, d=1280, L...
2026.05
3.338
Delta AttnRes
Scale=1044M, d=1280, L...
2026.05
3.339
Baseline
Scale=1044M, d=1280, L...
2026.05
3.36
Delta AttnRes
Scale=533M, d=1024, L=...
2026.05
3.401
Delta Block
Scale=533M, d=1024, L=...
2026.05
3.405
Full AttnRes
Scale=533M, d=1024, L=...
2026.05
3.422
AttnRes
Scale=533M, d=1024, L=...
2026.05
3.423
Baseline
Scale=533M, d=1024, L=...
2026.05
3.428
AttnRes
Scale=1044M, d=1280, L...
2026.05
3.428
Full AttnRes
Scale=1044M, d=1280, L...
2026.05
3.474
Delta AttnRes
Scale=220M, d=768, L=1...
2026.05
3.563
Full AttnRes
Scale=220M, d=768, L=1...
2026.05
3.577
Delta Block
Scale=220M, d=768, L=1...
2026.05
3.577
AttnRes
Scale=220M, d=768, L=1...
2026.05
3.586
Baseline
Scale=220M, d=768, L=1...
2026.05
3.613
Feedback
Search any
task
Search any
task