Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-image reasoning and general capability evaluation on NLVR2
Loading...
90.42
Accuracy
InternVL2.5
48.7368
59.5584
70.38
81.2016
Mar 7, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
InternVL2.5
Params=8B
2026.03
90.42
InternVL2.5 + CAPL
Params=8B
2026.03
90.13
Qwen2VL
Params=7B
2026.03
87.41
LLaVA-OV
Params=7B
2026.03
86.82
Idefics3
Params=8B
2026.03
85.14
GLM4.1VBase
Params=9B
2026.03
84.98
GLM4.1VBase + CAPL
Params=9B
2026.03
84.87
Qwen2.5-VL + CAPL
Params=7B
2026.03
80.05
Qwen2.5-VL
Params=7B
2026.03
79.85
InternVL2
Params=7B
2026.03
77.68
Idefics2
Params=8B
2026.03
56.81
LLaVA-Next
Params=7B
2026.03
50.34
Feedback
Search any
task
Search any
task