Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dialogue Generation on full-hh-rlhf (test)

79.3Win Rate (Beaver-7b-v3.0-reward)

ReMax+XRLHF

67.8670.8373.876.77Dec 15, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
79.380.1
2025.12
77.276.1
2025.12
76.478.5
2025.12
75.876.5
2025.12
71.466.8
2025.12
70.666.9
2025.12
69.867.5
2025.12
68.368