Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Spatial Reasoning (Multi-Image) on MMSI-Bench

97.2Accuracy

Human

21.59241.22160.8580.479Oct 10, 2025Nov 3, 2025Nov 28, 2025Dec 22, 2025Jan 16, 2026Feb 9, 2026Mar 6, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2025.10
97.2
45.2
2026.03
41.8
2025.10
40.7
2026.03
38
36.9
2025.10
36.9
2026.03
35.4
2026.03
34.8
2026.03
32.4
2025.10
32.3
2026.03
32
2025.10
31.7
2026.02
31.3
2026.03
31.1
2025.10
30.9
2025.10
30.7
2026.02
30.2
2026.03
30.2
2025.10
30.2
2026.03
30.1
2025.10
29.1
2026.03
28.9
2026.02
28.8
2026.03
28.6
2026.03
28
2026.02
27.7
2026.03
27.4
2026.03
27.4
2025.10
27.3
2026.02
27.1
2025.10
27
2026.02
26.9
2026.03
26.8
2025.10
26.8
2026.03
26.5
2026.03
26.1
2025.10
26.1
2026.03
25.8
2026.02
25.7
2026.03
25.2
2026.03
25
2026.02
24.5
2025.10
24.5