Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on RM-Bench Chat subset Normal
Loading...
86
Accuracy
REWARDAGENT_MINI
6.232
26.941
47.65
68.359
Feb 26, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
REWARDAGENT_MINI
Backbone=GPT-4o mini,...
2025.02
86
REWARDAGENT_MINI
Backbone=GPT-4o mini,...
2025.02
84.2
DeepSeek-R1
Method Category=LLM as...
2025.02
83.7
Skywork-Reward-Gemma-2-27B
Backbone=Gemma-2-27B,...
2025.02
82.7
REWARDAGENT_LLAMA
Backbone=Llama3-8B Ins...
2025.02
79.3
Skywork-Reward-Llama-3.1-8B-v0.2
Backbone=Llama-3.1-8B,...
2025.02
78
INF-ORM-Llama3.1-70B
Backbone=Llama-3.1-70B...
2025.02
77.5
ArmoRM-Llama3-8B-v0.1
Backbone=Llama-3-8B, M...
2025.02
76.7
o3-mini
Method Category=LLM as...
2025.02
76
REWARDAGENT_LLAMA
Backbone=Llama3-8B Ins...
2025.02
76
internlm2-20b-reward
Backbone=internlm2-20b...
2025.02
74.4
internlm2-7b-reward
Backbone=internlm2-7b,...
2025.02
72.6
GPT-4o
Method Category=LLM as...
2025.02
71.4
GPT-4o mini
Method Category=LLM as...
2025.02
60.5
DeepSeek-R1-Distill-Llama-8B
Backbone=Llama-8B, Met...
2025.02
42.1
Llama3-8B Instruct
Backbone=Llama3-8B, Me...
2025.02
9.3
Feedback
Search any
task
Search any
task