Share your thoughts, 1 month free Claude Pro on usSee more

Multimodal Reasoning on Downstream Overall

55.22BoN@8 Accuracy

InternVL2.5-38B + EVPV-PRM

Updated 4mo ago

Evaluation Results

Method	Links
InternVL2.5-38B + EVPV-PRM 2026.03		55.22	9.78
Gemini-2.0-Flash 2026.03		53.4	-
InternVL2.5-38B + VisualPRM 2026.03		50.7	6.3
Claude-3.5-Sonnet 2026.03		50.5	-
GPT-4o 2026.03		47.9	-
InternVL2.5-26B + EVPV-PRM 2026.03		46.75	9.52
InternVL2.5-26B + VisualPRM 2026.03		45.8	8.9
InternVL2.5-38B 2026.03		45.44	-
InternVL2.5-8B + EVPV-PRM 2026.03		41.67	8.83
InternVL2.5-8B + VisualPRM 2026.03		41.4	8.4
InternVL2.5-26B 2026.03		37.23	-
InternVL2.5-8B 2026.03		32.84	-