Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Reasoning on Downstream Overall
Loading...
55.22
BoN@8 Accuracy
InternVL2.5-38B + EVPV-PRM
31.9448
37.9874
44.03
50.0726
Mar 17, 2026
BoN@8 Accuracy
BoN@8 vs Pass@1 Delta
Updated 1mo ago
Evaluation Results
Method
Method
Links
BoN@8 Accuracy
BoN@8 vs Pass@1 Delta
InternVL2.5-38B + EVPV-PRM
Policy Model=InternVL2...
2026.03
55.22
9.78
Gemini-2.0-Flash
Reranking Strategy=Bes...
2026.03
53.4
-
InternVL2.5-38B + VisualPRM
Policy Model=InternVL2...
2026.03
50.7
6.3
Claude-3.5-Sonnet
Reranking Strategy=Bes...
2026.03
50.5
-
GPT-4o
Reranking Strategy=Bes...
2026.03
47.9
-
InternVL2.5-26B + EVPV-PRM
Policy Model=InternVL2...
2026.03
46.75
9.52
InternVL2.5-26B + VisualPRM
Policy Model=InternVL2...
2026.03
45.8
8.9
InternVL2.5-38B
Policy Model=InternVL2...
2026.03
45.44
-
InternVL2.5-8B + EVPV-PRM
Policy Model=InternVL2...
2026.03
41.67
8.83
InternVL2.5-8B + VisualPRM
Policy Model=InternVL2...
2026.03
41.4
8.4
InternVL2.5-26B
Policy Model=InternVL2...
2026.03
37.23
-
InternVL2.5-8B
Policy Model=InternVL2...
2026.03
32.84
-
Feedback
Search any
task
Search any
task