Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
High-level Planning on ReasonMap L (long questions)
Loading...
0.0747
Weighted Accuracy
Ariadne
0.039652
0.048751
0.05785
0.066949
Nov 1, 2025
Weighted Accuracy
Token Count
Weighted Map Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Weighted Accuracy
Token Count
Weighted Map Score
Ariadne
Backbone=Qwen2.5-VL-7B...
2025.11
0.0747
121
5.15
Base VLM
Backbone=Qwen2.5-VL-7B...
2025.11
0.06
61
4.51
SFT VLM
Backbone=Qwen2.5-VL-7B...
2025.11
0.041
50
3.71
Feedback
Search any
task
Search any
task