Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Koala

Benchmarks

Task NameDataset NameSOTA ResultTrend
Preference AlignmentKoala
Wins (Count)196
14
Instruction FollowingKoala low-resource
Win Rate (bn)58.3
7
Overrefusal EvaluationKoala
Refusal Rate4.44
6
Camera-controlled video generationKoala
RotErr0.637
6
Personalized LLM response generationKoala (test)
Win Rate (Reward Model)88
3
Preference AlignmentKoala
Win Rate (Reward Model)77.75
3
Showing 6 of 6 rows