Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
End-to-end Task Completion on τ-Bench Retail, N=5
Loading...
0.111
Task Completion Rate
Llama3.1-8B
0.03404
0.05402
0.074
0.09398
Oct 8, 2025
Task Completion Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Task Completion Rate
Llama3.1-8B
Schema configuration=P...
2025.10
0.111
Qwen2.5-7B
Schema configuration=P...
2025.10
0.097
Llama3.1-8B
Schema configuration=Base
2025.10
0.097
Qwen2.5-7B
Schema configuration=Base
2025.10
0.068
Llama3.2-3B
Schema configuration=P...
2025.10
0.056
Llama3.2-3B
Schema configuration=Base
2025.10
0.045
Qwen2.5-3B
Schema configuration=P...
2025.10
0.038
Qwen2.5-3B
Schema configuration=Base
2025.10
0.037
Feedback
Search any
task
Search any
task