Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Taxi Domain on Taxi Standard
Loading...
83.98
Accuracy
OpenAI o4-mini
-2.1424
20.2163
42.575
64.9337
Nov 28, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
OpenAI o4-mini
category=Proprietary M...
2025.11
83.98
OpenAI o3
category=Proprietary M...
2025.11
72.66
Claude 4.5 Sonnet
category=Proprietary M...
2025.11
66.41
WMAct
2025.11
62.16
PPO - Interactive
mode=interactive
2025.11
39.16
PPO - EntirePlan
mode=single-turn output
2025.11
38.92
GPT-5
category=Proprietary M...
2025.11
36.72
Gemini 2.5 Pro
category=Proprietary M...
2025.11
17.97
Qwen3-8B
category=Opensource Mo...
2025.11
9.38
Qwen3-14B
category=Opensource Mo...
2025.11
6.64
GPT-4o
category=Proprietary M...
2025.11
6.25
Qwen3-8B-Own
backbone=Qwen3-8B
2025.11
5.6
Qwen2.5-32B-Instruct
category=Opensource Mo...
2025.11
2.34
Qwen2.5-7B-Instruct
category=Opensource Mo...
2025.11
1.17
Feedback
Search any
task
Search any
task