Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on IFBench Normal
Loading...
80.5
Accuracy
REWARDAGENT_MINI
10.092
28.371
46.65
64.929
Feb 26, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
REWARDAGENT_MINI
Search Engine=true
2025.02
80.5
o3-mini
2025.02
76.3
DeepSeek-R1
2025.02
74.4
INF-ORM-Llama3.1-70B
2025.02
69.2
Skywork-Reward-Llama-3.1-8B-v0.2
2025.02
69.2
REWARDAGENT_LLAMA
Search Engine=true
2025.02
69.2
REWARDAGENT_MINI
Search Engine=false
2025.02
69.2
Skywork-Reward-Gemma-2-27B
2025.02
68.4
internlm2-20b-reward
2025.02
68.4
ArmoRM-Llama3-8B-v0.1
2025.02
66.2
GPT-4o
2025.02
66.2
REWARDAGENT_LLAMA
Search Engine=false
2025.02
63.9
internlm2-7b-reward
2025.02
61.7
GPT-4o mini
2025.02
59.4
DeepSeek-R1-Distill-Llama-8B
2025.02
55.6
Llama3-8B Instruct
2025.02
12.8
Feedback
Search any
task
Search any
task