Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMO-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningAMO-Bench
Avg@50.646
48
Mathematical ReasoningAMO-Bench
Mean@64 Accuracy11.8
27
Mathematical ReasoningAMO-Bench
Pass@836.72
20
Mathematical ReasoningAMO-Bench
Seed (Avg@5)0.56
16
Mathematical ReasoningAMO-Bench
Average@1614.8
12
Mathematical ReasoningAMO-Bench VeRA-H / VeRA-H Pro
Avg@5 Accuracy (Seeds)31.75
1
Showing 6 of 6 rows