Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Language Model Evaluation on Quality, Factuality, and Safety Evaluation Suite (test)

86.3Generation Quality Score

Self-Improving Pretraining

47.50857.57967.6577.721Jan 29, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
86.350.887.943.684.9
2026.01
8450.581.457.685.1
2026.01
73.649.173.93891.1
2026.01
66.163.177.126.371
2026.01
54.547.957.140.875.5
2026.01
5047.650.142.376.9
2026.01
4946.849.44476.9