Share your thoughts, 1 month free Claude Pro on usSee more

Vision-Language Reasoning on CVBench

86.16Accuracy

Qwen3-VL-8B + GRPO

Updated 5mo ago

Evaluation Results

Method	Links
Qwen3-VL-8B + GRPO 2026.01		86.16
Qwen2-VL-7B + GRPO 2026.01		75.21
Gemini-3.0-Flash 2026.01		67.2
Gemini-2.5-Pro 2026.01		62.4
Qwen2-VL-2B + GRPO 2026.01		60.31
InternVideo2.5-8B 2026.01		57.3
LLaVA-Video-7B 2026.01		52.6
GPT-4V 2026.01		52.4
Qwen2-VL-7B (baseline) 2026.01		50.7
Qwen3-VL-8B (baseline) 2026.01		45.8
Qwen2-VL-2B (baseline) 2026.01		31.38
Video-LLaVA-7B 2026.01		28.1