Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on AIME (pass@1 accuracy)

73.33Pass@1 Accuracy

Baseline

38.666847.665956.66565.6641Oct 1, 2025
Updated 23d ago

Evaluation Results

MethodLinks
2025.10
73.33
2025.10
70.28
2025.10
67.86
2025.10
60.56
2025.10
53.33
2025.10
50
2025.10
43.33
2025.10
40