Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMC23

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningAMC23
Pass@k98.6
30
Mathematical ReasoningAMC23
Average Score @3291.4
14
Mathematical ReasoningAMC23
Accuracy83.2
12
Multi-Turn Tool-Integrated Reasoning (TIR)AMC23
Peak avg@32 Score79.45
6
Showing 4 of 4 rows