Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Beyond AIME

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningBeyond-AIME
Avg@5 Score77.6
48
Mathematical ReasoningBeyond AIME
Accuracy58.8
45
Mathematical ReasoningBeyond-AIME v1 (test)
Avg@576.6
32
Mathematical ReasoningBeyond AIME
Pass@122
21
Mathematical ReasoningBeyond-AIME
Seed Score76.6
16
Mathematical ReasoningBeyond AIME
Total Tokens1
10
Mathematical ReasoningBeyond-AIME
Pass@138.9
10
Mathematical ReasoningBeyond-AIME VeRA-H VeRA-H Pro
Avg@5 Accuracy (Seeds)58.34
1
Mathematical ReasoningBeyond-AIME VeRA-E
Avg@5 Accuracy (Seeds)58.34
1
Showing 9 of 9 rows