Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visually Grounded Reasoning on V* Bench (test)

95Overall Accuracy

o3

61.40870.12978.8587.571Mar 4, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
95--
2026.03
94.294.893.4
2026.03
93.793.993.4
2026.03
91.193.986.8
2026.03
90.69386.8
2026.03
9092.186.8
2026.03
86.4--
2026.03
85.986.185.5
2026.03
84.890.880.9
2026.03
84.382.686.8
2026.03
82.283.580.3
2026.03
82.285.377.6
80.683.576.3
2026.03
8082.981.2
2026.03
74.876.375.4
2026.03
74.377.469.7
2026.03
62.753.959.2