Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visual Reasoning on REASONMAP Short questions

0.5998Weighted Accuracy

GPT-5

0.0328960.1800730.327250.474427Oct 2, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
0.59980.0948
2025.10
0.41150.0684
2025.10
0.3420.0525
2025.10
0.31510.0621
2025.10
0.29510.06
2025.10
0.28820.0588
0.27170.0568
0.26650.0509
2025.10
0.26220.0552
2025.10
0.26220.0557
0.16490.0388
2025.10
0.13630.0409
0.13280.0401
0.12760.033
0.08680.0275
0.05470.0244