Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Tool-Use on BFCL
Loading...
94.1
BFCL Average Score
GEAR
71.116
77.083
83.05
89.017
May 12, 2026
BFCL Average Score
Updated 21d ago
Evaluation Results
Method
Method
Links
BFCL Average Score
GEAR
Backbone=Qwen3-4B, Eva...
2026.05
94.1
GEAR
Backbone=Qwen3-8B, Eva...
2026.05
93.8
ARPO
Backbone=Qwen3-8B, Eva...
2026.05
92.6
MT-GRPO
Backbone=Qwen3-4B, Eva...
2026.05
92.4
ARPO
Backbone=Qwen3-4B, Eva...
2026.05
92.3
MT-GRPO
Backbone=Qwen3-8B, Eva...
2026.05
92.3
Base
Backbone=Qwen3-8B, Eva...
2026.05
92.1
GRPO
Backbone=Qwen3-8B, Eva...
2026.05
92
GRPO
Backbone=Qwen3-4B, Eva...
2026.05
91.9
GEAR
Backbone=Qwen3-4B, Eva...
2026.05
91.9
GRPO
Backbone=Qwen3-4B, Eva...
2026.05
91.1
Base
Backbone=Qwen3-4B, Eva...
2026.05
90.7
OPSD+RL
Backbone=Qwen3-8B, Eva...
2026.05
88.5
OPSD+RL
Backbone=Qwen3-4B, Eva...
2026.05
87.1
OPSD
Backbone=Qwen3-8B, Eva...
2026.05
76.4
OPSD
Backbone=Qwen3-4B, Eva...
2026.05
72
Feedback
Search any
task
Search any
task