Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (pass@K)
Loading...
86.67
Pass@K
Original
62.3964
68.6982
75
81.3018
Oct 1, 2025
Pass@K
Updated 6d ago
Evaluation Results
Method
Method
Links
Pass@K
Original
Backbone=DSQwen-14B
2025.10
86.67
STAR
Backbone=DSQwen-14B
2025.10
86.67
SafeChain
Backbone=DSQwen-14B
2025.10
86.67
DAPO
Backbone=DSQwen-14B
2025.10
86.67
RECAP
Backbone=DSQwen-14B
2025.10
86.67
SFT
Backbone=DSQwen-14B
2025.10
83.33
Original
Backbone=DSLlama-8B
2025.10
70
SafeChain
Backbone=DSLlama-8B
2025.10
70
RECAP
Backbone=DSLlama-8B
2025.10
70
STAR
Backbone=DSLlama-8B
2025.10
66.67
DAPO
Backbone=DSLlama-8B
2025.10
66.67
SFT
Backbone=DSLlama-8B
2025.10
63.33
Feedback
Search any
task
Search any
task