Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Aggregate Mathematical Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningAggregate Mathematical Tasks (AIME24/25, AMC23, Minerva, OlymMATH)
Average Score28.3
16
Showing 1 of 1 rows