Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Optimization on HelpSteer2 (test)
Loading...
0.8
Avg Pref Score vs QWEN2.5-0.5B
STACKELBERGGDA
-0.032
0.184
0.4
0.616
Dec 18, 2025
Avg Pref Score vs QWEN2.5-0.5B
Avg Pref Score vs RLOO
Avg Pref Score vs NASH-MD-PG
Avg Pref Score vs STACKELBERGGDA-LEADER
Avg Pref Score vs STACKELBERGGDA-FOLLOWER
Updated 1mo ago
Evaluation Results
Method
Method
Links
Avg Pref Score vs QWEN2.5-0.5B
Avg Pref Score vs RLOO
Avg Pref Score vs NASH-MD-PG
Avg Pref Score vs STACKELBERGGDA-LEADER
Avg Pref Score vs STACKELBERGGDA-FOLLOWER
STACKELBERGGDA
Role=Follower
2025.12
0.8
0.656
0.594
0.605
0
STACKELBERGGDA
Role=Leader
2025.12
0.734
0.613
0.503
0
0.395
NASH-MD-PG
2025.12
0.721
0.607
0
0.497
0.406
RLOO
2025.12
0.593
0
0.393
0.387
0.344
QWEN2.5-0.5B
2025.12
0
0.407
0.279
0.266
0.2
Feedback
Search any
task
Search any
task