Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AlignBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Dialogue Alignment EvaluationAlignBench
Reasoning6.76
90
Instruction FollowingAlignBench
Reasoning Score7.42
60
Pointwise GradingAlignBench
Pearson (r)0.997
38
General LLM EvaluationAlignBench
Reasoning Score7.27
20
Pairwise ComparisonAlignBench
Agreement74.69
18
Subjective AlignmentAlignBench
Subjective Score (0-10)6.8
10
Open-ended QA Response RankingAlignBench Minos
K Score47.68
9
AlignmentAlignBench v1 (test)
Score7.21
5
AlignmentAlignBench
Score8.27
4
Showing 9 of 9 rows