Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MATH (Accuracy, Delta Avg)
Loading...
92.8
Accuracy
CoT2-Meta
48.704
60.152
71.6
83.048
Mar 30, 2026
Accuracy
Delta Avg
Updated 18d ago
Evaluation Results
Method
Method
Links
Accuracy
Delta Avg
CoT2-Meta
Backbone=Claude-4.5, S...
2026.03
92.8
14.5
Vanilla ToT
Backbone=Claude-4.5, S...
2026.03
87.4
8.3
Best-of-16
Backbone=Claude-4.5, S...
2026.03
84.2
4.8
CoT2-Meta
Backbone=DeepSeek-V3.2...
2026.03
84.2
10.5
Vanilla ToT
Backbone=DeepSeek-V3.2...
2026.03
78.6
5.7
Greedy CoT
Backbone=Claude-4.5, S...
2026.03
78.5
-
Best-of-16
Backbone=DeepSeek-V3.2...
2026.03
75.3
3
Greedy CoT
Backbone=DeepSeek-V3.2...
2026.03
70.8
-
CoT2-Meta
Backbone=Qwen2.5-VL-7B...
2026.03
64.2
12.2
Vanilla ToT
Backbone=Qwen2.5-VL-7B...
2026.03
59.1
6.4
Best-of-16
Backbone=Qwen2.5-VL-7B...
2026.03
55.8
3.4
Greedy CoT
Backbone=Qwen2.5-VL-7B...
2026.03
50.4
-
Feedback
Search any
task
Search any
task