Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Large Language Model Evaluation on 12-task evaluation suite composite (test)

49.6Reading Comprehension Score

FineWeb-Edu

45.85646.82847.848.772Dec 30, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
49.660.546.251.2212
2025.12
49.360.745.951.4212
2025.12
49.159.948.952.6412
2025.12
4960.447.852.2112
2025.12
48.856.642.148.1612
2025.12
48.357.645.350112
2025.12
48.257.239.546.9112
2025.12
48.259.642.749.5112
2025.12
48.158.146.850.7112
2025.12
47.657.340.347.2212
2025.12
47.456.538.245.812
2025.12
46.856.740.246.8312
2025.12
46.153.336.743.812
2025.12
4653.638.844.912