Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-turn function calling on BFCL BASE and LONG CONTEXT multi-turn v3
Loading...
41.88
Avg@4 Success Rate
HINT-SD-Multi
25.3024
29.6062
33.91
38.2138
May 18, 2026
Avg@4 Success Rate
Best@4 Success Rate
Updated 15d ago
Evaluation Results
Method
Method
Links
Avg@4 Success Rate
Best@4 Success Rate
HINT-SD-Multi
backbone=Qwen3-4B-Inst...
2026.05
41.88
48.75
HINT-SD-Single
backbone=Qwen3-4B-Inst...
2026.05
36.25
43.13
GRPO
backbone=Qwen3-4B-Inst...
2026.05
31.56
41.25
SDPO
backbone=Qwen3-4B-Inst...
2026.05
30.78
40
SFT
mode=supervised fine-t...
2026.05
28.44
38.13
OpenClaw-RL
backbone=Qwen3-4B-Inst...
2026.05
28.28
45
Initial
mode=zero-shot, backbo...
2026.05
25.94
36.25
Feedback
Search any
task
Search any
task