Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
End-to-end planning on TravelPlanner
Loading...
0.225
Success Rate (CS/HD Avg)
Qwen 3 32B
0.13972
0.16186
0.184
0.20614
Dec 22, 2025
Success Rate (CS/HD Avg)
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate (CS/HD Avg)
Qwen 3 32B
Scale group=Large Scal...
2025.12
0.225
Qwen 2.5 72B
Scale group=Large Scal...
2025.12
0.205
Llama 3.1 70B
Scale group=Large Scal...
2025.12
0.176
GenEnv
Scale group=7B Models,...
2025.12
0.166
Llama 3.1 405B
Scale group=Large Scal...
2025.12
0.165
ReSearch
Scale group=7B Models,...
2025.12
0.164
SearchR1
Scale group=7B Models,...
2025.12
0.161
GPT-OSS 20B
Scale group=Large Scal...
2025.12
0.149
ToRL
Scale group=7B Models,...
2025.12
0.148
GPT-OSS 120B
Scale group=Large Scal...
2025.12
0.147
Qwen 3 14B
Scale group=Large Scal...
2025.12
0.147
Qwen 2.5 7B
Scale group=7B Models,...
2025.12
0.143
Feedback
Search any
task
Search any
task