Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Perception on HallusionBench
Loading...
59.5
Score
AVAR-Thinker
39.532
44.716
49.9
55.084
Mar 4, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
AVAR-Thinker
Model Category=Our model
2026.03
59.5
Claude-3.7-Sonnet
Model Category=Closed-...
2026.03
58.3
GPT-4o
Model Category=Closed-...
2026.03
56.2
Vision-SR1
Model Category=Multimo...
2026.03
54.3
Mulberry-7B
Model Category=Multimo...
2026.03
54.1
OpenVLThinker
Model Category=Multimo...
2026.03
53
ThinkLite-VL
Model Category=Multimo...
2026.03
52.3
Vision-R1
Model Category=Multimo...
2026.03
51.9
InternVL2.5-8B
Model Category=Open-So...
2026.03
51.1
VLAA-Thinker-7B
Model Category=Multimo...
2026.03
50.9
Qwen2.5-VL-7B
Model Category=Open-So...
2026.03
50.7
MM-Eureka-7B
Model Category=Multimo...
2026.03
50.7
LLaVA-OneVision-7B
Model Category=Open-So...
2026.03
47.5
R1-OneVision
Model Category=Multimo...
2026.03
46
Llama-3.2-11B-Vision-Instruct
Model Category=Open-So...
2026.03
40.3
Feedback
Search any
task
Search any
task