Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-shot Reasoning on Advanced Reasoning Suite (MMLU-Pro, GPQA, AIME)

74.8MMLU-Pro Accuracy

Base

69.18470.64272.173.558Oct 13, 2025
Updated 6d ago

Evaluation Results

MethodLinks
2025.10
74.858.673.373.370
2025.10
7457.77068.967.7
2025.10
73.858.871.171.269
2025.10
70.75473.36064.5
2025.10
69.855.271.157.863.5
2025.10
69.453.273.361.163.7