Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Dialogue Generation on full-hh-rlhf (test)
Loading...
79.3
Win Rate (Beaver-7b-v3.0-reward)
ReMax+XRLHF
67.86
70.83
73.8
76.77
Dec 15, 2025
Win Rate (Beaver-7b-v3.0-reward)
Win Rate (GPT-4)
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate (Beaver-7b-v3.0-reward)
Win Rate (GPT-4)
ReMax+XRLHF
Base Model=pythia-2.8B
2025.12
79.3
80.1
ReMax+XRLHF
Base Model=opt-1.3B
2025.12
77.2
76.1
PPO+XRLHF
Base Model=opt-1.3B
2025.12
76.4
78.5
PPO+XRLHF
Base Model=pythia-2.8B
2025.12
75.8
76.5
ReMax
Base Model=pythia-2.8B
2025.12
71.4
66.8
ReMax
Base Model=opt-1.3B
2025.12
70.6
66.9
PPO
Base Model=pythia-2.8B
2025.12
69.8
67.5
PPO
Base Model=opt-1.3B
2025.12
68.3
68
Feedback
Search any
task
Search any
task