Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General VQA on HallusionBench
Loading...
73.48
Accuracy
Gemini 3-Pro
40.8136
49.2943
57.775
66.2557
Feb 4, 2026
Feb 21, 2026
Mar 11, 2026
Mar 29, 2026
Apr 15, 2026
May 3, 2026
May 21, 2026
Accuracy
Updated 12d ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini 3-Pro
2026.02
73.48
Qwen2.5-VL-7B-Instruct + VPPO
Backbone model=Qwen2.5...
2026.05
70.8
Perception-R1-7B
Training Data=1.4K
2026.05
70
Qwen2.5-VL-7B-Instruct + Faithful-MR1
Backbone model=Qwen2.5...
2026.05
69.8
Qwen2.5-VL-7B-Instruct + GRPO
Backbone model=Qwen2.5...
2026.05
69.3
Vision-R1-7B
Training Data=210K
2026.05
68.1
Vision-SR1-7B
Training Data=56K
2026.05
68.1
Qwen2.5-VL-3B-Instruct + Faithful-MR1
Backbone model=Qwen2.5...
2026.05
68
Qwen2.5-VL-3B-Instruct + VPPO
Backbone model=Qwen2.5...
2026.05
67.7
Qwen2.5-VL-3B-Instruct + GRPO
Backbone model=Qwen2.5...
2026.05
67.2
GPT-5
tier=High
2026.02
66.58
Qwen2.5-VL-7B-Instruct
Backbone model=Qwen2.5...
2026.05
65
Qwen2.5-VL-3B-Instruct
Backbone model=Qwen2.5...
2026.05
64.8
Qwen3-VL
mode=Thinking
2026.02
64.01
ERNIE 5.0
2026.02
63.87
Gemini 2.5-Pro
2026.02
63.7
Qwen3-VL
Language Backbone=Qwen...
2026.03
51.89
Intern3.5-VL
Language Backbone=Qwen...
2026.03
48.18
FineViT-VL
Language Backbone=Qwen...
2026.03
46.54
Aquila-VL
Language Backbone=Qwen...
2026.03
42.07
Feedback
Search any
task
Search any
task