Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMC23

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningAMC23
Pass@k98.6
35
Mathematical ReasoningAMC23 decontaminated
Accuracy69.8
14
Mathematical ReasoningAMC23
Average Score @3291.4
14
Mathematical ReasoningAMC23
Accuracy83.2
12
Multi-Turn Tool-Integrated Reasoning (TIR)AMC23
Peak avg@32 Score79.45
6
Competition Mathematics ReasoningAMC23
Full Length10.9
4
Showing 6 of 6 rows