Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Regression on NYC Taxi Trip Duration
Loading...
100
Leaderboard Percentile
MCTS-Outcome
-4
23
50
77
Nov 29, 2025
Leaderboard Percentile
Updated 4d ago
Evaluation Results
Method
Method
Links
Leaderboard Percentile
MCTS-Outcome
Base LLM=GPT-4o
2025.11
100
Hierarchical MCTS
Base Model=GPT-4.1-mini
2025.11
2.7
MCTS-Shaped
Base Model=GPT-4.1-mini
2025.11
1.24
ReAct
Base Model=GPT-4.1-mini
2025.11
0
LATS
Base Model=GPT-4.1-mini
2025.11
0
MCTS-Outcome
Base Model=GPT-4.1-mini
2025.11
0
ReAct
Base LLM=GPT-4o
2025.11
0
LATS
Base LLM=GPT-4o
2025.11
0
MCTS-Shaped
Base LLM=GPT-4o
2025.11
0
Hierarchical MCTS
Base LLM=GPT-4o
2025.11
0
Feedback
Search any
task
Search any
task