Share your thoughts, 1 month free Claude Pro on usSee more

Multi-image understanding on QBench2

81.7Accuracy

DelimScaling

Updated 4mo ago

Evaluation Results

Method	Links
DelimScaling 2026.02		81.7
Qwen2.5-VL 2026.02		81.4
DelimScaling 2026.02		80.1
InternVL3 2026.02		79.6
Qwen2VL 2026.03		76.8
DelimScaling 2026.02		76.6
DelimScaling 2026.02		76.5
InternVL3 2026.02		76.5
Qwen2.5-VL 2026.02		75.8
InternVL2.5 2026.03		75.5
InternVL2.5 + CAPL 2026.03		75.3
GLM4.1VBase 2026.03		74.4
DelimScaling 2026.02		74.2
LLaVA-OV 2026.02		73.9
GLM4.1VBase + CAPL 2026.03		73.6
Qwen2.5-VL + CAPL 2026.03		72.4
Qwen2.5-VL 2026.03		71.1
LLaVA-OV 2026.03		70.1
InternVL2 2026.03		69.8
DelimScaling 2026.02		65.6
InternVL3 2026.02		65.2
Idefics3 2026.03		64.2
DelimScaling 2026.02		63.3
Qwen2.5-VL 2026.02		62.7
DelimScaling 2026.02		51.9
LLaVA-OV 2026.02		51.7
InternVL3 2026.02		50.8
DelimScaling 2026.02		50.2
LLaVA-Next 2026.03		39.5
Idefics2 2026.03		38.6