Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Humor on HH-RLHF (test)
Loading...
2.481
Reward
PPO
2.35724
2.38937
2.4215
2.45363
Aug 19, 2025
Reward
KL Divergence
Updated 4d ago
Evaluation Results
Method
Method
Links
Reward
KL Divergence
PPO
Model Size=13B
2025.08
2.481
12.14
MAVIS
Model Size=13B
2025.08
2.465
9.2
MAVIS
Model Size=7B
2025.08
2.376
3.78
PPO
Model Size=7B
2025.08
2.362
10.43
Feedback
Search any
task
Search any
task