Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math Reasoning on AIME25 (Pass@16, Vendi)
Loading...
31.7
Pass@16
ESamp
27.54
28.62
29.7
30.78
Apr 27, 2026
Pass@16
Vendi Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@16
Vendi Score
ESamp
Backbone=Qwen2.5-7B-In...
2026.04
31.7
0.46
Vanilla
Backbone=Qwen2.5-7B-In...
2026.04
30.3
0.32
Min-P
Backbone=Qwen2.5-7B-In...
2026.04
29.5
0.3
OverRIDE
Backbone=Qwen2.5-7B-In...
2026.04
27.7
0.35
Feedback
Search any
task
Search any
task