Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visual Hallucination Evaluation on HallusionBench

76.6Accuracy

Qwen2-VL-72B-Thinking

26.6839.6452.665.56Jul 3, 2024Oct 16, 2024Jan 30, 2025May 16, 2025Aug 30, 2025Dec 14, 2025Mar 30, 2026
Updated 18d ago

Evaluation Results

MethodLinks
2026.03
76.6---
2026.03
76.4---
2026.02
74.97---
2026.02
73.5---
2026.03
73---
2026.02
72.5---
2026.02
72.24---
2026.03
71.6---
70.6---
2026.02
67.5---
65.3---
2026.03
64.4---
2026.03
63.9---
2026.02
63.83---
2026.02
63.7---
2025.12
63.62---
2026.02
63.41---
2025.12
62.04---
2026.02
61.3---
2025.12
58.36---
2025.12
57.94---
2025.12
56.05---
2025.12
55.31---
2026.02
55.1---
2026.03
54.5---
2025.12
54.26---
2025.12
54.26---
2024.07
47.5---
2024.07
46.5---
2026.03
46.5---
2024.07
45.2---
2024.07
42.4---
41.2---
36.1---
34.8---
34.5---
28.6---
2026.01
-31.4244.2267.58
2026.01
-28.7939.8865.28
2026.01
-9.4525.4347.12
2026.01
-10.5524.8646.94
2026.01
-21.7628.6156.86
2026.01
-7.698.6736.85
2026.01
-15.1620.5248.09
2026.01
-5.936.6539.15
2026.01
-6.3711.2738.44
2026.01
-10.559.8340.3
2026.01
-8.7910.1235.78
2026.01
-9.4510.1145.26
2026.01
-5.0512.4340.48
2026.01
-13.8519.9447.3
2026.01
-9.4510.443.93
2026.01
-8.7913.0142.78
2026.01
-17.3623.749.94
2026.01
-5.276.3634.37
2026.01
-15.618.2145.96