Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Hallucination Evaluation on HallBench

65.2Score

GPT-5-Thinking

48.35252.72657.161.474Sep 30, 2025
Updated 19d ago

Evaluation Results

MethodLinks
2025.09
65.2
2025.09
64.1
2025.09
57.4
2025.09
54.7
2025.09
52.5
2025.09
50
2025.09
49.5
2025.09
49.5
2025.09
49