Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Reasoning on General Multimodal Reasoning Aggregate
Loading...
65.83
Average Performance
PAPO_D
48.462
52.971
57.48
61.989
Jul 8, 2025
Average Performance
Relative Gain (%)
Updated 3d ago
Evaluation Results
Method
Method
Links
Average Performance
Relative Gain (%)
PAPO_D
Backbone=Qwen2.5-VL, M...
2025.07
65.83
15.61
PAPO_G
Backbone=Qwen2.5-VL, M...
2025.07
63.5
1.53
GRPO
Backbone=Qwen2.5-VL, M...
2025.07
62.51
-
DAPO
Backbone=Qwen2.5-VL, M...
2025.07
57.58
-
PAPO_D
Backbone=Qwen2.5-VL, M...
2025.07
57.09
5
DAPO
Backbone=Qwen2.5-VL, M...
2025.07
55.02
-
PAPO_G
Backbone=Qwen2.5-VL, M...
2025.07
53.39
3.38
GRPO
Backbone=Qwen2.5-VL, M...
2025.07
51.89
-
PAPO_G
Backbone=Qwen3-VL (thi...
2025.07
51.36
4.52
GRPO
Backbone=Qwen3-VL (thi...
2025.07
49.13
-
Feedback
Search any
task
Search any
task