Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AlpacaFarm

Benchmarks

Task NameDataset NameSOTA ResultTrend
Instruction FollowingAlpacaFarm (test)
Reward Score387.196
40
Direct Prompt InjectionAlpacaFarm (208 samples)
Naive Success Rate78.36
30
Instruction FollowingAlpacaFarm Eval (test)
Win Rate76.13
28
Instruction FollowingAlpacaFarm
Win Rate59.2
15
Generation quality evaluationAlpacaFarm
Win Rate36.4
12
Showing 5 of 5 rows