Share your thoughts, 1 month free Claude Pro on usSee more

Reward Modeling on Anthropic HH (test)

68.49Accuracy

Dahoas/gptj-rm-static

Updated 4mo ago

Evaluation Results

Method	Links
Dahoas/gptj-rm-static 2023.04		68.49
Alpaca-RRHF_DP 2023.04		61.75
Alpaca-PPO 2023.04		46.03
Alpaca 2023.04		45.13
LLaMA 2023.04		45.09