Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Understanding on MMLU Redux (test)

66.9Accuracy

Ours (theory-guided context selection strategy)

63.67664.51365.3566.187Feb 9, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
66.9
2026.02
66.8
2026.02
66.6
2026.02
66.5
2026.02
66
2026.02
65.8
65
2026.02
64.9
2026.02
64.7
2026.02
64.6
2026.02
64.1
2026.02
63.8