Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Spatial Reasoning (Multi-Image) on SPAR-Bench

67.3Accuracy

Human

21.90433.689545.47557.2605Oct 10, 2025Oct 30, 2025Nov 20, 2025Dec 11, 2025Dec 31, 2025Jan 21, 2026Feb 11, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2025.10
67.3
2026.02
52.6
2025.10
41.6
2025.12
40.3
2025.10
38.1
2026.02
37.6
2025.12
37.6
2025.10
37.6
2025.10
37.6
2025.10
37.4
2026.02
37.1
2026.02
36.9
2025.10
36.9
36.3
2026.02
36
2025.10
36
2026.02
35.1
2025.12
33.6
2025.10
33.1
2025.12
33.07
2026.02
32.4
2025.10
32.4
2026.02
31.6
2025.10
31.5
2025.10
31.3
2025.10
31
2025.10
30.6
2025.12
23.65