Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Understanding on HallusionBench

77.2Accuracy

PRISM + GRPO

50.10857.141564.17571.2085Apr 23, 2026Apr 25, 2026Apr 28, 2026May 1, 2026May 3, 2026May 6, 2026May 9, 2026
Updated 15d ago

Evaluation Results

MethodLinks
2026.04
77.2
2026.04
76.1
2026.04
75.8
2026.04
74.8
2026.04
73.6
2026.04
72.9
2026.04
72.9
2026.04
72.6
2026.04
72.3
2026.04
72
2026.04
71.9
2026.04
71.9
2026.04
71.6
2026.04
71.5
2026.04
71.2
2026.04
70.1
2026.04
69.5
2026.04
69.1
2026.04
68.2
2026.05
63.76
2026.05
62.5
2026.05
61.77
2026.04
61.5
2026.05
61.03
2026.05
59.87
2026.05
59.87
2026.05
59.66
2026.05
59.35
2026.05
57.66
2026.05
56.61
2026.05
56.3
2026.05
55.88
2026.05
55.77
2026.05
54.62
2026.04
53.8
2026.05
51.25
2026.05
51.15