Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Mathematical Reasoning on AIME 2024 (reference set)
Loading...
33.3
Pass@1
ScaleBiO
26.436
28.218
30
31.782
Jun 28, 2024
Pass@1
Consensus@64
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Consensus@64
ScaleBiO
Backbone=DeepSeek-R1-D...
2024.06
33.3
33.3
RHO-LOSS
Backbone=DeepSeek-R1-D...
2024.06
30
33.3
Uniform
Backbone=DeepSeek-R1-D...
2024.06
26.7
33.3
LESS
Backbone=DeepSeek-R1-D...
2024.06
26.7
33.3
Feedback
Search any
task
Search any
task