Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Tool Use on T-eval (Held-Out)
Loading...
71.8
Accuracy
SHAD+RFT
53.6
58.325
63.05
67.775
Dec 19, 2024
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
SHAD+RFT
Model=LLaMA3-8B
2024.12
71.8
SHAD+α-FT
Model=LLaMA3-8B
2024.12
68.8
Rho-1
Model=LLaMA3-8B
2024.12
68.4
SHAD+RFT
Model=LLaMA3.1-8B
2024.12
68.3
SFT
Model=LLaMA3-8B
2024.12
67
Rho-1
Model=LLaMA3.1-8B
2024.12
67
RewardFT
Model=LLaMA3.1-8B
2024.12
66.4
RewardFT
Model=LLaMA3-8B
2024.12
66.3
SFT
Model=LLaMA3.1-8B
2024.12
64.2
SHAD+α-FT
Model=LLaMA3.1-8B
2024.12
63.8
Regex+RFT
Model=LLaMA3-8B
2024.12
61.1
Regex
Model=LLaMA3.1-8B
2024.12
58.6
Regex+RFT
Model=LLaMA3.1-8B
2024.12
57.6
Regex
Model=LLaMA3-8B
2024.12
54.3
Feedback
Search any
task
Search any
task