Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Human Preference Alignment on HH (test)
Loading...
3.8764
Reward
TRE-P
-0.151312
0.894344
1.94
2.985656
Feb 3, 2026
Reward
Updated 4d ago
Evaluation Results
Method
Method
Links
Reward
TRE-P
Backbone=Qwen2.5-1.5B-...
2026.02
3.8764
TRE-K
Backbone=Qwen2.5-1.5B-...
2026.02
3.8209
KL-Cov
Backbone=Qwen2.5-1.5B-...
2026.02
3.3931
Forking-Tokens
Backbone=Qwen2.5-1.5B-...
2026.02
3.3184
Vanilla (PPO)
Backbone=Qwen2.5-1.5B-...
2026.02
3.24
Ent
Backbone=Qwen2.5-1.5B-...
2026.02
3.1913
TRE-P
Backbone=Qwen2.5-7B-In...
2026.02
3.0331
TRE-K
Backbone=Qwen2.5-7B-In...
2026.02
2.9715
KL-Cov
Backbone=Qwen2.5-7B-In...
2026.02
2.9529
Vanilla (PPO)
Backbone=Qwen2.5-7B-In...
2026.02
2.8839
Ent
Backbone=Qwen2.5-7B-In...
2026.02
2.8718
Forking-Tokens
Backbone=Qwen2.5-7B-In...
2026.02
2.849
Base
Backbone=Qwen2.5-7B-In...
2026.02
2.0749
Base
Backbone=Qwen2.5-1.5B-...
2026.02
0.0036
Feedback
Search any
task
Search any
task