Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-modal Multi-image Reasoning on MMT (val)
Loading...
67.4
Accuracy
InternVL2-Llama3-76B
40.048
47.149
54.25
61.351
Jan 12, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
InternVL2-Llama3-76B
Model Size=76B, Backbo...
2026.01
67.4
GPT-4V
Model Size=Proprietary
2026.01
64.3
Qwen2VL-7B
Model Size=7B
2026.01
61.7
InternVL2-8B
Model Size=8B
2026.01
57.9
LLaVA-OV-7B
Model Size=7B
2026.01
56.6
Ours (masked) (LLaVA-OV-7B)
Model Size=7B, Backbon...
2026.01
55.3
Qwen2VL-2B
Model Size=2B
2026.01
51.9
Ours (LLaVA-OV-1.5B)
Model Size=1.5B, Backb...
2026.01
48.8
Ours (masked) (LLaVA-OV-1.5B)
Model Size=1.5B, Backb...
2026.01
48.1
LLaVA-OV-1.5B
Model Size=1.5B
2026.01
47.5
InternVL2-2B
Model Size=2B
2026.01
46.7
Ours (masked) (LLaVA-OV-0.5B)
Model Size=0.5B, Backb...
2026.01
45.9
Ours (LLaVA-OV-0.5B)
Model Size=0.5B, Backb...
2026.01
45.6
LLaVA-OV-0.5B
Model Size=0.5B
2026.01
41.1
Feedback
Search any
task
Search any
task