Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Prediction on RewardBench
Loading...
91.8
Accuracy
C2
66.216
72.858
79.5
86.142
Apr 15, 2026
Apr 19, 2026
Apr 23, 2026
Apr 27, 2026
May 1, 2026
May 5, 2026
May 10, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
C2
Backbone=Qwen3-8B
2026.04
91.8
Reasoning RM + External-Rubric (32B)
Backbone=Qwen3-8B
2026.04
91.3
Reasoning RM + Self-Rubric
Backbone=Qwen3-8B
2026.04
90.8
Reasoning RM
Backbone=Qwen3-8B
2026.04
89.8
Base Model
Backbone=Qwen3-8B
2026.04
89.1
Reasoning RM + External-Rubric (32B)
Backbone=Tulu3-8B-SFT
2026.04
84.9
C2
Backbone=Tulu3-8B-SFT
2026.04
77.2
EvoPref
Optimization Paradigm=...
2026.05
75.5
ORPO
Optimization Paradigm=...
2026.05
75
DPO
Optimization Paradigm=...
2026.05
74.9
SMS-EMOA
Optimization Paradigm=...
2026.05
74.8
EvoPref-Best
Optimization Paradigm=...
2026.05
74.8
MOEA/D
Optimization Paradigm=...
2026.05
74.5
KTO
Optimization Paradigm=...
2026.05
74.1
CMA-ES
Optimization Paradigm=...
2026.05
74
IPO
Optimization Paradigm=...
2026.05
73.8
Reasoning RM
Backbone=Tulu3-8B-SFT
2026.04
73.7
Reasoning RM + Self-Rubric
Backbone=Tulu3-8B-SFT
2026.04
70.8
Base Model
Backbone=Tulu3-8B-SFT
2026.04
67.2
Feedback
Search any
task
Search any
task