Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Selection on ToolBench
Loading...
58.5
F1 Score
OLIVIA
12.428
24.389
36.35
48.311
May 11, 2026
F1 Score
Updated 21d ago
Evaluation Results
Method
Method
Links
F1 Score
OLIVIA
Backbone=Qwen3-4B
2026.05
58.5
CLIN
Backbone=Qwen3-4B
2026.05
55.3
ReAct
Backbone=Qwen3-4B
2026.05
52
OLIVIA
Backbone=Mistral-7B-v0.1
2026.05
42
CLIN
Backbone=Mistral-7B-v0.1
2026.05
39.4
ReAct
Backbone=Mistral-7B-v0.1
2026.05
38.6
BM25
Backbone=Qwen3-4B
2026.05
35.3
BM25
Backbone=Mistral-7B-v0.1
2026.05
35.3
Bandit
Backbone=Qwen3-4B
2026.05
34.8
Bandit
Backbone=Mistral-7B-v0.1
2026.05
34.8
CoT
Backbone=Qwen3-4B
2026.05
21.1
CoT
Backbone=Mistral-7B-v0.1
2026.05
14.2
Feedback
Search any
task
Search any
task