Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Task on AppWorld Average
Loading...
59.5
Average Score
ReAct + ACE
41.716
46.333
50.95
55.567
Oct 6, 2025
Average Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Average Score
ReAct + ACE
Base LLM=DeepSeek-V3.1...
2025.10
59.5
ReAct + ACE
Base LLM=DeepSeek-V3.1...
2025.10
59.4
ReAct + ACE
Base LLM=DeepSeek-V3.1...
2025.10
57.2
ReAct + DC (CU)
Base LLM=DeepSeek-V3.1...
2025.10
51.9
ReAct + GEPA
Base LLM=DeepSeek-V3.1...
2025.10
46.4
ReAct + ICL
Base LLM=DeepSeek-V3.1...
2025.10
46
ReAct
Base LLM=DeepSeek-V3.1...
2025.10
42.4
Feedback
Search any
task
Search any
task