Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PreferenceBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
LLM-as-a-JudgePreferenceBench
Rstd0.69
36
LLM-as-a-Judge Evaluation ConsistencyPreferenceBench
Kappa79.73
4
Showing 2 of 2 rows