Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-discipline Reasoning on MMMU-Pro (accuracy@8)
Loading...
47.1
Accuracy@8
Qwen2.5-VL-32B + VPPO
24.22
30.16
36.1
42.04
Oct 10, 2025
Accuracy@8
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy@8
Qwen2.5-VL-32B + VPPO
Model Scale=32B, Algor...
2025.10
47.1
Qwen2.5-VL-32B + DAPO
Model Scale=32B, Algor...
2025.10
46.4
Qwen2.5-VL-32B + GRPO
Model Scale=32B, Algor...
2025.10
45.4
MM-Eureka-32B
Model Scale=32B, Promp...
2025.10
43.1
NoisyRollout-32B
Model Scale=32B, Promp...
2025.10
43.1
Qwen2.5-VL-32B
Model Scale=32B
2025.10
39.6
Qwen2.5-VL-7B + VPPO
Model Scale=7B, Algori...
2025.10
37.9
VL-Rethinker-7B
Model Scale=7B, Traini...
2025.10
37
PAPO-D-7B
Model Scale=7B, Traini...
2025.10
36.3
Qwen2.5-VL-7B + DAPO
Model Scale=7B, Algori...
2025.10
35.9
Qwen2.5-VL-7B + GRPO
Model Scale=7B, Algori...
2025.10
35.2
R1-ShareVL-7B
Model Scale=7B, Traini...
2025.10
35.1
NoisyRollout-7B
Model Scale=7B, Traini...
2025.10
34.5
MM-Eureka-7B
Model Scale=7B, Traini...
2025.10
30.3
ThinkLite-7B
Model Scale=7B, Traini...
2025.10
28
Qwen2.5-VL-7B
Model Scale=7B
2025.10
25.1
Feedback
Search any
task
Search any
task