Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Spatial Reasoning on RealWorldQA

69.67Accuracy

Full Model

-2.381216.324435.0353.7356Sep 30, 2025Oct 27, 2025Nov 23, 2025Dec 21, 2025Jan 17, 2026Feb 13, 2026Mar 13, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.02
69.67--
2025.09
69.473.7-
2026.02
69.28--
2026.02
67.45--
2025.09
66.870.5-
2025.09
66.370.6-
2025.12
65.9--
2025.09
65.769.6-
2025.09
64.367.9-
2026.03
62.9--
2025.12
62.2--
2026.03
60.9--
2026.03
57.4--
2025.12
57.3--
2026.03
56.5--
2025.12
56.25--
2026.02
55.42--
2026.02
55.29--
2025.12
55.28--
2025.12
55.26--
2025.12
55.21--
2025.12
55.12--
2026.02
54.64--
2025.12
54.51--
2025.12
54.5--
2025.12
54.44--
2025.12
54.4--
2025.12
53.5--
2025.12
53.3--
2026.02
53.07--
2025.12
52.94--
2026.03
52.6--
2025.12
52.28--
2026.03
50.3--
2026.03
49.8--
2025.12
49.74--
2025.12
47.3--
2025.12
47.3--
2025.12
46.9--
2025.12
43.2--
2025.12
42.75--
2026.03
41.2--
2026.02
32.55--
2026.02
5.23--
2026.02
0.39--
2026.04
--69.02
2026.04
--63.05
2026.04
--65.88
2026.04
--62.35
2026.04
--49.87
2026.04
--67.41
2026.04
--65.67
2026.04
--66.8
2026.04
--68.5
2026.04
--71.9
2026.04
--73.59
2026.04
--66.67
2026.04
--67.64
2026.04
--61.83
2026.04
--60.7