Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Vicuna Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM-as-a-judge evaluationVicuna Bench
Pearson Correlation (r)0.605
16
Feedback Evaluation AlignmentVicuna Bench
Kendall's Tau0.423
6
Feedback evaluationVicuna Bench (test)
Kendall's Tau0.468
5
Showing 3 of 3 rows