Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Generation Quality and Coherence Evaluation on SlimPajama Quality Evaluation (test)
Loading...
86.3
Gen Quality (Std. Prefix)
Self-Improving Pretraining
47.508
57.579
67.65
77.721
Jan 29, 2026
Gen Quality (Std. Prefix)
Avg Standard Evals Score
Coherence Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Gen Quality (Std. Prefix)
Avg Standard Evals Score
Coherence Score
Self-Improving Pretraining
Backbone=Llama 1.4B, T...
2026.01
86.3
50.8
87.9
Llama Base
Backbone=Llama 1.4B, P...
2026.01
50
47.6
50.1
Llama Pretrain Baseline
Backbone=Llama 1.4B, T...
2026.01
49
46.8
49.4
Feedback
Search any
task
Search any
task