Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Household Planning on AHAT
Loading...
3.1
Time (s)
AHAT-TGPO
0.676
17.038
33.4
49.762
Feb 12, 2026
Time (s)
Success Rate (Easy)
Success Rate (Complex)
Success Rate (Abstract)
Updated 4d ago
Evaluation Results
Method
Method
Links
Time (s)
Success Rate (Easy)
Success Rate (Complex)
Success Rate (Abstract)
AHAT-TGPO
2026.02
3.1
99.5
94.6
89
GPT-5
2026.02
8
35.7
21.3
11.5
Gemini-3.0-pro
2026.02
23.5
33.6
17
18
DELTA
2026.02
37.8
50.5
19.7
9.1
SayPlan
2026.02
63.7
50.3
26.5
20.1
Feedback
Search any
task
Search any
task