Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Spatial Reasoning over Optical/SAR Imagery on ThinkGeo
Loading...
87.05
Perception F1 Score
GPT-4o
58.346
65.798
73.25
80.702
Jan 30, 2026
Perception F1 Score
Operation F1 Score
Logic F1 Score
Answer Score
Answer Instance Metric
Updated 4d ago
Evaluation Results
Method
Method
Links
Perception F1 Score
Operation F1 Score
Logic F1 Score
Answer Score
Answer Instance Metric
GPT-4o
2026.01
87.05
76.68
67.88
11.51
20.02
Claude-3.7-Sonnet
2026.01
85.16
85.93
64.41
8.95
11.42
GeoEvolver
Backbone=GPT-4o-mini
2026.01
83.9
90.5
79.03
46.88
53.74
GPT-4-1106
2026.01
79.91
69.15
56.29
9.46
16.91
Qwen3-8B
2026.01
59.45
70.37
33.53
7.67
8.68
Feedback
Search any
task
Search any
task