Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Decision Making on AlfWorld
Loading...
98.4
Transition Success Rate
AutoRefine
75.624
81.537
87.45
93.363
Dec 19, 2024
Feb 24, 2025
May 3, 2025
Jul 10, 2025
Sep 16, 2025
Nov 23, 2025
Jan 30, 2026
Transition Success Rate
ACC1
Delta ACC
Steps
Updated 4d ago
Evaluation Results
Method
Method
Links
Transition Success Rate
ACC1
Delta ACC
Steps
AutoRefine
Backbone=GPT-4-turbo
2026.01
98.4
-
-
12.8
Reflexion
Backbone=GPT-4-turbo
2026.01
95.5
-
-
15.7
ReAct + Reflexion
Backbone=GPT-4-turbo
2026.01
95.5
-
-
16.1
ChatGPT o1-mini
Model=o1-mini
2024.12
92.3
1.5
8.2
-
ReAct
Backbone=GPT-4-turbo
2026.01
91
-
-
14
ChatGPT 4o
Model=4o
2024.12
76.6
14.2
20.9
-
ChatGPT 3.5-turbo
Model=3.5-turbo
2024.12
76.5
7.5
5.2
-
Feedback
Search any
task
Search any
task