Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on BBEH (Accuracy, Delta Avg)
Loading...
81.2
Accuracy
CoT2-Meta
63.832
68.341
72.85
77.359
Mar 30, 2026
Accuracy
Average Delta
Updated 18d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Delta
CoT2-Meta
Backbone=DeepSeek-V3.2...
2026.03
81.2
10.5
Vanilla ToT
Backbone=DeepSeek-V3.2...
2026.03
77.8
5.7
Best-of-16
Backbone=DeepSeek-V3.2...
2026.03
75.9
3
CoT2-Meta
Backbone=Claude-4.5, S...
2026.03
75.8
14.5
Greedy CoT
Backbone=DeepSeek-V3.2...
2026.03
74.4
-
CoT2-Meta
Backbone=Qwen2.5-VL-7B...
2026.03
72.5
12.2
Vanilla ToT
Backbone=Claude-4.5, S...
2026.03
71.2
8.3
Vanilla ToT
Backbone=Qwen2.5-VL-7B...
2026.03
69.1
6.4
Best-of-16
Backbone=Claude-4.5, S...
2026.03
68.9
4.8
Best-of-16
Backbone=Qwen2.5-VL-7B...
2026.03
66.8
3.4
Greedy CoT
Backbone=Claude-4.5, S...
2026.03
65.4
-
Greedy CoT
Backbone=Qwen2.5-VL-7B...
2026.03
64.5
-
Feedback
Search any
task
Search any
task