Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Asynchronous planning on NL-AAVE (test)

72.4Accuracy

GPT-4o (zero-shot)

14.6829.66544.6559.635Feb 3, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
72.4
2026.02
57.3
2026.02
57.3
2026.02
50.7
2026.02
40
2026.02
28.9
2026.02
16.9