Share your thoughts, 1 month free Claude Pro on usSee more

Conversational Assistant on HH-RLHF

0.5Reward

RLOO

Updated 5mo ago

Evaluation Results

Method	Links
RLOO 2025.03		0.5	5	0.0017	-
ZOPrO 2025.03		0.25	120	0.0031	-
PPO 2025.03		-0.4	530	-0.0012	-