Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Visual Hallucination Evaluation on HallusionBench

31.42Accuracy (Q)

GPT-4V

3.995211.115118.23525.3549Jan 6, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
31.42-44.2267.58
2026.01
28.79-39.8865.28
2026.01
21.76-28.6156.86
2026.01
17.36-23.749.94
2026.01
15.6-18.2145.96
2026.01
15.16-20.5248.09
2026.01
13.85-19.9447.3
2026.01
10.55-24.8646.94
2026.01
10.55-9.8340.3
2026.01
9.45-25.4347.12
2026.01
9.45-10.1145.26
2026.01
9.45-10.443.93
2026.01
8.79-10.1235.78
2026.01
8.79-13.0142.78
2026.01
7.69-8.6736.85
2026.01
6.37-11.2738.44
2026.01
5.93-6.6539.15
2026.01
5.27-6.3634.37
2026.01
5.05-12.4340.48
2024.07
-47.5--
2024.07
-46.5--
2024.07
-45.2--
2024.07
-42.4--
2025.12
-54.26--
2025.12
-58.36--
2025.12
-62.04--
2025.12
-63.62--
2025.12
-55.31--
2025.12
-54.26--
2025.12
-57.94--
2025.12
-56.05--
2026.02
-63.83--
2026.02
-63.41--
2026.02
-73.5--
2026.02
-72.24--
2026.02
-74.97--