Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TinyStories

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingTinyStories (val)
Last Loss1.1284
21
Narrative GenerationTinyStories 21 (test)
Speedup (x)1.62
15
Text ContinuationTinyStories 1K random samples (test)
R-139.7
10
Language ModelingTinyStories 60M tokens (val)
PPL (Val)51.23
8
Language ModelingTinyStories 10k (val)
Validation Loss (nats/token)1.1284
7
Narrative Video GenerationTinyStories
Image Quality76.93
7
Topical Text SteeringTinyStories
Average Target Score31.5
6
Scaling-law extrapolationTinyStories high-D holdout
RMSE (log space)0.053
6
Scaling-law extrapolationTinyStories high-C holdout
RMSE (log space)0.095
6
Language Modeling EvaluationTinyStories
Grammar6.63
5
Story Generation EvaluationTinyStories GPT-4.1 Nano
Grammar6.47
5
Story GenerationTinyStories
Grammar Score6.37
5
Language GenerationTinyStories (test)
Grammar9.93
5
Token recoveryTinyStories
Mean Queries2
2
Lineage VerificationTinyStories
p-value0
2
Fingerprint persistenceTinyStories cleaned V2
T-Test Statistic0
2
Model Fingerprint VerificationTinyStories (test)
t-test p-value0
2
Lineage VerificationTinyStories Continual seed 123 (train)
t-test (logits)0.434
1
Lineage VerificationTinyStories seed 1000 Continual (train)
t-test p-value (logits)5.36
1
Showing 19 of 19 rows