Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visually Grounded Reasoning on V* Bench

95.7Average Accuracy

o3-0416

64.81272.83180.8588.869Jul 10, 2025Aug 18, 2025Sep 27, 2025Nov 5, 2025Dec 15, 2025Jan 23, 2026Mar 4, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2025.07
95.7--
2025.07
91.19487
2026.03
90.69386.8
2026.03
9092.186.8
2025.07
9092.186.8
2026.03
85.983.589.5
2026.03
85.986.185.5
2026.03
84.890.880.9
2025.07
84.890.880.9
2026.03
84.382.686.8
2026.03
82.283.580.3
2026.03
82.285.377.6
2026.03
80.683.576.3
2025.07
80.683.576.3
2026.03
77.577.477.6
2026.03
76.475.777.6
2025.07
76.475.777.6
2026.02
74.974.875
2026.03
74.377.469.7
2025.07
74.377.469.7
2026.03
73.880.963.2
2025.07
73.880.963.2
2026.02
72.873.971.1
2026.03
72.37371.1
2025.07
72.37371.1
2026.02
71.274.865.8
2026.02
71.273.967.1
2026.03
70.77360.5
2025.07
70.77360.5
2026.02
68.17360.5
2026.03
66--
2025.07
66--