Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visual Reasoning on Out-of-Domain (OOD) Aggregate (HalluBench, MathVista, MathVerse, MathVision)

0.5531OOD Avg Accuracy

SaEI

0.4977720.5121360.52650.540864Dec 11, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.12
0.5531
2025.12
0.5474
2025.12
0.547
2025.12
0.5368
2025.12
0.4999