Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Language Modeling on Pre-training corpus (train)

15.71Perplexity

Pre-LN + LayerNorm Scaling

-40.2528337.4961715.2451,092.9939Feb 9, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.02
15.71-
2025.02
17.02-
2025.02
18.2-
2025.02
19.58-
2025.02
20.35-
2025.02
21.39-
2025.02
21.92-
2025.02
22.77-
2025.02
25.76-
2025.02
26.07-
2025.02
26.73-
2025.02
26.95-
2025.02
27.17-
2025.02
1,362.59-
2025.02
1,363.21-
2025.02
1,368.33-
2025.02
1,390.75-
2025.02
1,409.08-
2025.02
1,409.79-
2025.02
1,414.78-
2024.07
-2.56
2024.07
-2.62
2024.07
-2.69
2024.07
-2.68
2024.07
-2.67
2024.07
-2.27
2024.07
-2.37
2024.07
-2.42
2024.07
-2.27
2024.07
-2.26