Share your thoughts, 1 month free Claude Pro on usSee more

Step-wise Verification on MMMU

59.2Macro F1

Qwen2.5-VL

Updated 4mo ago

Evaluation Results

Method	Links
Qwen2.5-VL 2025.11		59.2
Gemini-2.0-Flash 2025.11		58.5
TIM-PRM 2025.11		58.3
Qwen3-VL 2025.11		56.6
GPT-4o 2025.11		56.3
TIM-PRM 2025.11		55.8
VisualPRM 2025.11		54.9
GPT-4o-Mini 2025.11		53.6
Qwen2.5-VL 2025.11		53.1
InternVL2.5 2025.11		52
InternVL2.5 2025.11		51.5
MM-PRM 2025.11		51.2
Qwen3-VL 2025.11		50.4
InternVL2.5 2025.11		48.8
InternVL2.5 2025.11		47.1
LLaVA-OV 2025.11		46.1
LLaVA-OV 2025.11		45.7
MiniCPM-V2.6 2025.11		44.9