Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Prediction on RewardBench 2
Loading...
73.9
Accuracy
Reasoning RM + External-Rubric (32B)
33.652
44.101
54.55
64.999
Apr 15, 2026
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Reasoning RM + External-Rubric (32B)
Backbone=Qwen3-8B
2026.04
73.9
C2
Backbone=Qwen3-8B
2026.04
71
Base Model
Backbone=Qwen3-8B
2026.04
69.7
Reasoning RM + Self-Rubric
Backbone=Qwen3-8B
2026.04
69.4
Reasoning RM
Backbone=Qwen3-8B
2026.04
67.6
Reasoning RM + External-Rubric (32B)
Backbone=Tulu3-8B-SFT
2026.04
59.6
C2
Backbone=Tulu3-8B-SFT
2026.04
50.7
Reasoning RM
Backbone=Tulu3-8B-SFT
2026.04
45.6
Reasoning RM + Self-Rubric
Backbone=Tulu3-8B-SFT
2026.04
40.8
Base Model
Backbone=Tulu3-8B-SFT
2026.04
35.2
Feedback
Search any
task
Search any
task