Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multiple Choice Question Answering on MMLU-Redux (test)

84.54Accuracy

Qwen2.5-32B

64.655269.817674.9880.1424Oct 17, 2025Nov 3, 2025Nov 20, 2025Dec 7, 2025Dec 24, 2025Jan 10, 2026Jan 27, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
84.54---
2025.10
83.79---
2026.01
83.280.13370.125-0.6991
2026.01
83.280.12320.0659-0.7096
2025.10
82.73---
2026.01
80.520.15190.0664-0.6534
2025.10
79.26---
2026.01
79.210.1720.1628-0.6201
2026.01
77.350.12290.0326-0.6506
2026.01
75.630.12780.0534-0.6285
2026.01
71.740.2450.2417-0.4724
2026.01
71.740.21530.1679-0.5022
2026.01
65.420.1580.0176-0.4962
2026.01
-0.11160.0584-0.7212
2026.01
-0.18740.0352-0.5299
2026.01
-0.15710.0546-0.635