Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-modal Reasoning on MathVista (Accuracy)
Loading...
79.2
Accuracy
AutoNPO
67.968
70.884
73.8
76.716
Nov 13, 2025
Dec 9, 2025
Jan 5, 2026
Feb 1, 2026
Feb 27, 2026
Mar 26, 2026
Apr 22, 2026
Accuracy
Updated 23d ago
Evaluation Results
Method
Method
Links
Accuracy
AutoNPO
2026.04
79.2
RLEP
type=far future
2026.04
78.5
ExGRPO
type=historical replay
2026.04
77.3
NPO
stage=early-stage only
2026.04
76.6
NPO
stage=early + late-stage
2026.04
76.3
GRPO
type=pure on-policy
2026.04
76.2
Qwen3-VL-8B-Instruct
2026.04
73.8
LUFFY
type=external teacher
2026.04
73.8
Self-Instruct + Solver Feedback
Backbone=Qwen2.5-VL-7B...
2025.11
70.3
CoT Cold-Start + Solver Feedback
Backbone=Qwen2.5-VL-7B...
2025.11
70.1
Self-Instruct
Backbone=Qwen2.5-VL-7B...
2025.11
69.8
Self-Instruct + CoT-Self-Instruct
Backbone=Qwen2.5-VL-7B...
2025.11
69.3
Self-Instruct + R-Zero
Backbone=Qwen2.5-VL-7B...
2025.11
69.3
CoT Cold-Start
Backbone=Qwen2.5-VL-7B...
2025.11
69.1
Seed Set
Backbone=Qwen2.5-VL-7B...
2025.11
69
Base Model
Backbone=Qwen2.5-VL-7B...
2025.11
68.4
Feedback
Search any
task
Search any
task