Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Tool Use on Nexus (Held-Out)
Loading...
32
Accuracy
SHAD+RFT
5.428
12.3265
19.225
26.1235
Dec 19, 2024
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
SHAD+RFT
Model=LLaMA3.1-8B
2024.12
32
SHAD+α-FT
Model=LLaMA3.1-8B
2024.12
28.9
SHAD+α-FT
Model=LLaMA3-8B
2024.12
28.7
SHAD+RFT
Model=LLaMA3-8B
2024.12
27.8
Rho-1
Model=LLaMA3.1-8B
2024.12
26
SFT
Model=LLaMA3.1-8B
2024.12
19.5
RewardFT
Model=LLaMA3.1-8B
2024.12
19.1
Rho-1
Model=LLaMA3-8B
2024.12
19
Regex+RFT
Model=LLaMA3.1-8B
2024.12
16.2
Regex
Model=LLaMA3.1-8B
2024.12
14.3
SFT
Model=LLaMA3-8B
2024.12
14
Regex+RFT
Model=LLaMA3-8B
2024.12
12.4
RewardFT
Model=LLaMA3-8B
2024.12
8
Regex
Model=LLaMA3-8B
2024.12
6.45
Feedback
Search any
task
Search any
task