Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

In-Domain Reasoning Suite

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningIn-domain Reasoning Suite (AIME24, AIME25, AMC23, Math500, Minerva, Olympia) (test)
AIME24 Score29.3
10
Mathematical ReasoningIn-Domain Reasoning Suite
MATH Score91.4
9
Mathematical ReasoningIn-Domain Reasoning Suite (MATH, Olympiad, AMC, AIME)
MATH Score94.4
6
Showing 3 of 3 rows