Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Out-of-Domain (OOD) Aggregate

Benchmarks

Task NameDataset NameSOTA ResultTrend
Visual ReasoningOut-of-Domain (OOD) Aggregate (HalluBench, MathVista, MathVerse, MathVision)
OOD Avg Accuracy0.5531
5
Showing 1 of 1 rows