Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Hallucination and Visual Reasoning Evaluation on HallusionBench

59.2Score

Qwen-VL-Max-0809

15.41626.78338.1549.517May 30, 2024Jul 15, 2024Aug 30, 2024Oct 15, 2024Nov 30, 2024Jan 15, 2025Mar 3, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2024.10
59.2----
2024.10
55.4----
54.2----
49.9----
45.7----
45.6----
2024.10
42.4----
41.3----
40.6----
39.3----
2024.10
39----
2024.10
38----
2024.05
36.8----
2024.10
36.4----
2024.10
36.1----
2024.10
34.3----
2024.10
32.2----
2024.05
31.9----
2025.03
31.9----
2024.05
31.2----
2024.10
30.9----
2025.03
30.4----
2024.05
30----
2024.05
29.9----
2025.03
29.9----
2024.10
29.6----
2025.03
29.5----
2024.05
29.4----
2024.05
27.6----
2024.10
27.6----
2024.05
27.3----
2025.03
27.3----
2024.05
26.4----
2025.03
21.6----
2025.03
20.4----
2025.03
19.8----
2025.03
17.1----
2024.02
-45.3214.7410.33-
2024.02
-43.8514.169.01-
2024.02
-42.279.838.57-
2024.02
-41.229.257.25-
2024.02
-45.3213.2910.77-
2024.02
-42.6910.126.81-
2024.02
-48.1511.5411.11-
2024.02
-45.5310.9810.99-
2024.02
-45.8514.4510.33-
----42.4
2025.12
----38.6
2025.12
----40.1
2025.12
----25.8
2025.12
----46.6
2025.11
-60.0741.7434.3-
2025.11
-67.1751.340.07-
2025.11
-57.3621.335.38-
2025.11
-68.3546.9640.43-
2025.11
-69.247.3942.24-
2025.11
-66.1649.5740.07-
2025.11
-61.7643.9135.38-
2025.11
-73.656.9649.46-
2025.11
-73.156.0949.82-
2025.11
-75.358.2653.79-
2025.11
-66.547.3938.99-
2025.11
-65.4847.8338.27-
2025.11
-67.1747.8339.71-
2025.11
-67.1743.9139.35-
2025.11
-67.6844.7842.24-
2025.11
-67.0139.5737.91-
2025.11
-66.547.3941.52-
2025.11
-68.1944.3541.16-
2025.11
-69.3748.2641.52-
2025.11
-70.950.4342.24-