Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Task Completion on AppWorld LeaderBoard
Loading...
48.8
Greedy Success Rate
ReAct
-0.08
12.61
25.3
37.99
Dec 1, 2025
Greedy Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Greedy Success Rate
ReAct
Think=false, LLM=GPT-4...
2025.12
48.8
CuES
Think=false, LLM=Qwen2...
2025.12
45.24
PlanExec
Think=false, LLM=GPT-4...
2025.12
44.6
FullCodeRefl
Think=false, LLM=GPT-4...
2025.12
33.9
PlanExec
Think=false, LLM=GPT-4...
2025.12
32.7
ReAct
Think=false, LLM=GPT-4...
2025.12
26.8
FullCodeRefl
Think=false, LLM=GPT-4...
2025.12
25.6
FullCodeRefl
Think=false, LLM=LLaMA...
2025.12
24.4
ReAct
Think=false, LLM=DeepS...
2025.12
20.8
FullCodeRefl
Think=false, LLM=DeepS...
2025.12
13.1
PlanExec
Think=false, LLM=LLaMA...
2025.12
8.9
ReAct
Think=false, LLM=LLaMA...
2025.12
7.1
PlanExec
Think=false, LLM=DeepS...
2025.12
1.8
Feedback
Search any
task
Search any
task