Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Math, CollegeMath, AIME, Minerva

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningMath-500, CollegeMath, AIME24, Minerva Average
Average Accuracy28.56
7
Showing 1 of 1 rows