Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Model Evaluation on RewardBench (Accuracy)
Loading...
93.9
Accuracy
SAVE
82.668
85.584
88.5
91.416
May 29, 2026
Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Accuracy
SAVE
Policy Model=Qwen3-4B-...
2026.05
93.9
SAVE
Policy Model=Qwen2.5-3...
2026.05
93.6
SAVE (w/o Curriculum Mechanism)
Policy Model=Qwen2.5-3...
2026.05
93.6
SAVE (w/o Policy Model Optimization)
Policy Model=Qwen2.5-3...
2026.05
93.6
SAVE (w/o Curriculum Mechanism)
Policy Model=Qwen3-4B-...
2026.05
93.6
SAVE (w/o Policy Model Optimization)
Policy Model=Qwen3-4B-...
2026.05
93.5
HL-BT
Policy Model=Qwen3-4B-...
2026.05
93.3
HL-BT
Policy Model=Qwen2.5-3...
2026.05
93.2
Continual Offline Training RM
Training Data=HuggingF...
2026.05
93.1
Skywork-Reward-V2-Llama-3.2-3B
Description=Initial re...
2026.05
93
Mean Reward
Policy Model=Qwen3-4B-...
2026.05
83.4
Mean Reward
Policy Model=Qwen2.5-3...
2026.05
83.1
Feedback
Search any
task
Search any
task