Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Task Completion on ToolSandbox
Loading...
0.67
Average Task Reward
GPT-5.1 with H-EPM
0.64608
0.65229
0.6585
0.66471
Dec 8, 2025
Average Task Reward
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Task Reward
GPT-5.1 with H-EPM
Model=GPT-5.1, Enhance...
2025.12
0.67
GPT-5.1 base
Model=GPT-5.1, Configu...
2025.12
0.647
Feedback
Search any
task
Search any
task