Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-term Planning on AgentBench LTP
Loading...
32.3
Task Completion Score (TCS)
Gemini-3-flash
11.812
17.131
22.45
27.769
May 27, 2026
Task Completion Score (TCS)
Updated 6d ago
Evaluation Results
Method
Method
Links
Task Completion Score (TCS)
Gemini-3-flash
Average Steps (Stp)=17...
2026.05
32.3
GPT-5-mini
Average Steps (Stp)=19...
2026.05
27.3
Qwen3-235B-A22B
Average Steps (Stp)=12...
2026.05
21.6
DeepSeek-V3.2
Average Steps (Stp)=20...
2026.05
12.6
Feedback
Search any
task
Search any
task