Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multimodal Reasoning on HallusionBench

0.7293Accuracy

TGRL-DAPO

-0.0291720.1677390.364650.561561Jul 23, 2025Sep 2, 2025Oct 13, 2025Nov 23, 2025Jan 3, 2026Feb 13, 2026Mar 27, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.03
0.7293-
2026.03
0.722-
2026.03
0.7202-
2026.03
0.7129-
2026.03
0.7108-
2026.03
0.71-
2025.12
0.709-
2026.03
0.7087-
2026.03
0.708-
2026.03
0.7066-
2025.12
0.706-
2025.12
0.7-
2026.03
0.698-
2025.12
0.697-
2025.12
0.692-
2026.03
0.686-
2025.12
0.685-
2025.12
0.684-
2026.03
0.679-
2025.12
0.672-
2025.12
0.666-
2025.12
0.664-
2026.03
0.656-
2026.03
0.646-
2025.12
0.6430.079
2025.12
0.632-
2025.12
0.623-
2025.12
0.616-
2025.12
0.6-
2025.12
0.599-
2025.12
0.5530.22
2025.07
0.458-
2025.07
0.456-
2025.07
0.428-
2025.07
0.417-
2025.07
0.399-
2025.07
0.38-
2025.07
0.377-
2025.07
0.322-
2025.07
0.084-
2025.07
0.031-
2025.07
0-