Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Large Language Model Evaluation on 12-task evaluation suite composite (test)

49.6Reading Comprehension Score

FineWeb-Edu

45.85646.82847.848.772Dec 30, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
49.660.546.251.2212
2025.12
49.360.745.951.4212
2025.12
49.159.948.952.6412
2025.12
4960.447.852.2112
2025.12
48.856.642.148.1612
2025.12
48.357.645.350112
2025.12
48.257.239.546.9112
2025.12
48.259.642.749.5112
2025.12
48.158.146.850.7112
2025.12
47.657.340.347.2212
2025.12
47.456.538.245.812
2025.12
46.856.740.246.8312
2025.12
46.153.336.743.812
2025.12
4653.638.844.912