Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Tool Use on StableToolBench Held-In
Loading...
50.4
Pass Rate
SHAD+RFT
23.464
30.457
37.45
44.443
Dec 19, 2024
Pass Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass Rate
SHAD+RFT
Model=LLaMA3.1-8B
2024.12
50.4
SHAD+RFT
Model=LLaMA3-8B
2024.12
50.1
SHAD+α-FT
Model=LLaMA3.1-8B
2024.12
49.2
SFT
Model=LLaMA3.1-8B
2024.12
48.5
RewardFT
Model=LLaMA3.1-8B
2024.12
48.2
SHAD+α-FT
Model=LLaMA3-8B
2024.12
47
Regex+RFT
Model=LLaMA3.1-8B
2024.12
46.7
RewardFT
Model=LLaMA3-8B
2024.12
44.4
SFT
Model=LLaMA3-8B
2024.12
43.1
Regex
Model=LLaMA3.1-8B
2024.12
42.3
Regex+RFT
Model=LLaMA3-8B
2024.12
41.2
Regex
Model=LLaMA3-8B
2024.12
36.2
Rho-1
Model=LLaMA3.1-8B
2024.12
30.6
Rho-1
Model=LLaMA3-8B
2024.12
24.5
Feedback
Search any
task
Search any
task