Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on JudgeBench Knowledge
Loading...
74.4
Accuracy
DeepSeek-R1
-0.272
19.114
38.5
57.886
Feb 26, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
DeepSeek-R1
2025.02
74.4
REWARDAGENT_MINI
Search Engine=false
2025.02
68.2
o3-mini
2025.02
66.6
GPT-4o
2025.02
64.6
internlm2-20b-reward
2025.02
61.7
REWARDAGENT_MINI
Search Engine=true
2025.02
60.7
INF-ORM-Llama3.1-70B
2025.02
59.1
Skywork-Reward-Llama-3.1-8B-v0.2
2025.02
57.8
internlm2-7b-reward
2025.02
56.2
Skywork-Reward-Gemma-2-27B
2025.02
55.8
REWARDAGENT_LLAMA
Search Engine=true
2025.02
55.2
REWARDAGENT_LLAMA
Search Engine=false
2025.02
52.9
ArmoRM-Llama3-8B-v0.1
2025.02
51.9
GPT-4o mini
2025.02
51.9
DeepSeek-R1-Distill-Llama-8B
2025.02
47.7
Llama3-8B Instruct
2025.02
2.6
Feedback
Search any
task
Search any
task