Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-modal Evaluation on MME-RW
Loading...
31.9
Mean Accuracy
TTAug
25.972
27.511
29.05
30.589
Oct 3, 2025
Mean Accuracy
MME-RW Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Mean Accuracy
MME-RW Accuracy
TTAug
Adaptation strategy=Te...
2025.10
31.9
-
(2)
Adaptation strategy=Mo...
2025.10
31.4
-
TTAug
test-time scaling=Meth...
2025.10
31.1
-
(1)
Adaptation strategy=(1)
2025.10
30.9
-
Baseline
test-time scaling=none
2025.10
27.8
-
Baseline
2025.10
27.8
-
Method ③
test-time scaling=Othe...
2025.10
27.6
-
Method ④
test-time scaling=Othe...
2025.10
27.6
-
Method ②
test-time scaling=Othe...
2025.10
26.4
-
Method ①
test-time scaling=Othe...
2025.10
26.2
-
VL-Rethinker
Reasoning Paradigm=Tex...
2026.02
-
47.21
PixelReasoner
Reasoning Paradigm=Thi...
2026.02
-
49.7
DeepEyes
Reasoning Paradigm=Thi...
2026.02
-
49.5
Adaptive-CoF
Reasoning Paradigm=Thi...
2026.02
-
50.9
MIRROR
Reasoning Paradigm=Thi...
2026.02
-
51.49
Feedback
Search any
task
Search any
task