Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Prediction on RewardBench
Loading...
91.8
Accuracy
C2
66.216
72.858
79.5
86.142
Apr 15, 2026
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
C2
Backbone=Qwen3-8B
2026.04
91.8
Reasoning RM + External-Rubric (32B)
Backbone=Qwen3-8B
2026.04
91.3
Reasoning RM + Self-Rubric
Backbone=Qwen3-8B
2026.04
90.8
Reasoning RM
Backbone=Qwen3-8B
2026.04
89.8
Base Model
Backbone=Qwen3-8B
2026.04
89.1
Reasoning RM + External-Rubric (32B)
Backbone=Tulu3-8B-SFT
2026.04
84.9
C2
Backbone=Tulu3-8B-SFT
2026.04
77.2
Reasoning RM
Backbone=Tulu3-8B-SFT
2026.04
73.7
Reasoning RM + Self-Rubric
Backbone=Tulu3-8B-SFT
2026.04
70.8
Base Model
Backbone=Tulu3-8B-SFT
2026.04
67.2
Feedback
Search any
task
Search any
task