Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on Anthropic HH (test)
Loading...
68.49
Accuracy
Dahoas/gptj-rm-static
44.154
50.472
56.79
63.108
Apr 11, 2023
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Dahoas/gptj-rm-static
2023.04
68.49
Alpaca-RRHF_DP
Training Algorithm=RRH...
2023.04
61.75
Alpaca-PPO
Training Algorithm=PPO
2023.04
46.03
Alpaca
2023.04
45.13
LLaMA
2023.04
45.09
Feedback
Search any
task
Search any
task