Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Task on AppWorld Normal (test)
Loading...
76.2
TGC
ReAct + ACE
63.2
66.575
69.95
73.325
Oct 6, 2025
TGC
SGC
Updated 19d ago
Evaluation Results
Method
Method
Links
TGC
SGC
ReAct + ACE
Base LLM=DeepSeek-V3.1...
2025.10
76.2
64.3
ReAct + ACE
Base LLM=DeepSeek-V3.1...
2025.10
75
64.3
ReAct + ACE
Base LLM=DeepSeek-V3.1...
2025.10
69.6
53.6
ReAct + DC (CU)
Base LLM=DeepSeek-V3.1...
2025.10
65.5
58.9
ReAct + GEPA
Base LLM=DeepSeek-V3.1...
2025.10
64.9
44.6
ReAct + ICL
Base LLM=DeepSeek-V3.1...
2025.10
64.3
46.4
ReAct
Base LLM=DeepSeek-V3.1...
2025.10
63.7
42.9
Feedback
Search any
task
Search any
task