Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Asynchronous planning on NL-AAVE (test)
Loading...
72.4
Accuracy
GPT-4o (zero-shot)
14.68
29.665
44.65
59.635
Feb 3, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
GPT-4o (zero-shot)
Mode=zero-shot
2026.02
72.4
Graph (40 steps) + NL (40 steps)
Model=Qwen 1.5B, Stage...
2026.02
57.3
7B (NL 40 steps)
Model=Qwen 7B, Trainin...
2026.02
57.3
NL only (80 steps)
Model=Qwen 1.5B, Total...
2026.02
50.7
3B (NL 40 steps)
Model=Qwen 3B, Trainin...
2026.02
40
GPT-4o-mini (zero-shot)
Mode=zero-shot
2026.02
28.9
NL (40 steps) + Graph (40 steps)
Model=Qwen 1.5B, Stage...
2026.02
16.9
Feedback
Search any
task
Search any
task