Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM Alignment Evaluation on Human Evaluation
Loading...
4.42
Coherence Score
Hard-Pair-GRPO
4.108
4.189
4.27
4.351
May 7, 2026
Coherence Score
Helpfulness Score
Harmlessness Score
Relevance Score
Overall Score
Updated 26d ago
Evaluation Results
Method
Method
Links
Coherence Score
Helpfulness Score
Harmlessness Score
Relevance Score
Overall Score
Hard-Pair-GRPO
2026.05
4.42
4.4
4.51
4.45
4.45
ORPO
2026.05
4.28
4.26
4.36
4.32
4.31
DPO
2026.05
4.25
4.22
4.33
4.29
4.27
Soft-Pair-GRPO
2026.05
4.2
4.15
4.28
4.23
4.22
Standard GRPO
2026.05
4.12
4.05
4.21
4.18
4.14
Feedback
Search any
task
Search any
task