Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2025 (reference set)
Loading...
26.7
pass@1
ScaleBiO
19.732
21.541
23.35
25.159
Jun 28, 2024
pass@1
Consensus@64
Updated 4d ago
Evaluation Results
Method
Method
Links
pass@1
Consensus@64
ScaleBiO
Backbone=DeepSeek-R1-D...
2024.06
26.7
36.7
Uniform
Backbone=DeepSeek-R1-D...
2024.06
20
33.3
LESS
Backbone=DeepSeek-R1-D...
2024.06
20
36.7
RHO-LOSS
Backbone=DeepSeek-R1-D...
2024.06
20
33.3
Feedback
Search any
task
Search any
task