Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on MMLU-Pro (Accuracy and Resource Usage)

60.11Accuracy

CoT Cold-Start + Solver Feedback

36.470842.607948.74554.8821Nov 13, 2025Nov 27, 2025Dec 12, 2025Dec 27, 2025Jan 11, 2026Jan 26, 2026Feb 10, 2026
Updated 23d ago

Evaluation Results

MethodLinks
2025.11
60.11---
2025.11
59.78---
2025.11
58.79---
2025.11
57.86---
2025.11
57.12---
2025.11
56.79---
2026.02
56.6240-5.7
2026.02
56.611.11.23
2026.02
56.599.8-5.3
2026.02
56.5913.7-4.2
2026.02
56.366.6-2.9
2025.11
53.67---
2025.11
53.28---
2025.11
52.88---
2025.11
37.38---