Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SkillCraft

Benchmarks

Task NameDataset NameSOTA ResultTrend
RegressionSkillcraft (test)
Log Likelihood (Test)0.0078
17
Tool-use task completionSkillCraft Hard
Success Rate20
16
Tool-use task completionSkillCraft Overall
Avg Tokens (M)0.26
16
RegressionSkillCraft1 Master Table
Average RMSE5.201
9
RegressionSkillCraft (3 seeds)
RMSE (Avg)5.201
9
Conditional Density EstimationSkillcraft 4D (test)
Negative Log-Likelihood (NLL)1.65
6
Showing 6 of 6 rows