Our new X account is live! Follow @wizwand_team for updates

Dialogue Generation on full-hh-rlhf (test)

79.3Win Rate (Beaver-7b-v3.0-reward)

ReMax+XRLHF

Updated 4d ago

Evaluation Results

Method	Links
ReMax+XRLHF 2025.12		79.3	80.1
ReMax+XRLHF 2025.12		77.2	76.1
PPO+XRLHF 2025.12		76.4	78.5
PPO+XRLHF 2025.12		75.8	76.5
ReMax 2025.12		71.4	66.8
ReMax 2025.12		70.6	66.9
PPO 2025.12		69.8	67.5
PPO 2025.12		68.3	68