Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
High-level Planning on ReasonMap S (short questions)
Loading...
15.44
Weighted Accuracy
SFT VLM
13.2352
13.8076
14.38
14.9524
Nov 1, 2025
Weighted Accuracy
Output Token Count
Weighted Map Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Weighted Accuracy
Output Token Count
Weighted Map Score
SFT VLM
Backbone=Qwen2.5-VL-7B...
2025.11
15.44
25
3.79
Ariadne
Backbone=Qwen2.5-VL-7B...
2025.11
14.5
43
3.67
Base VLM
Backbone=Qwen2.5-VL-7B...
2025.11
13.32
26
3.73
Feedback
Search any
task
Search any
task