Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Prediction on RM-Bench
Loading...
87.8
Accuracy
C2
54.832
63.391
71.95
80.509
Apr 15, 2026
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
C2
Backbone=Qwen3-8B
2026.04
87.8
Reasoning RM + External-Rubric (32B)
Backbone=Qwen3-8B
2026.04
84.6
Reasoning RM
Backbone=Qwen3-8B
2026.04
81.3
Reasoning RM + Self-Rubric
Backbone=Qwen3-8B
2026.04
81.3
Base Model
Backbone=Qwen3-8B
2026.04
80.1
Reasoning RM + External-Rubric (32B)
Backbone=Tulu3-8B-SFT
2026.04
77.7
C2
Backbone=Tulu3-8B-SFT
2026.04
65.6
Reasoning RM
Backbone=Tulu3-8B-SFT
2026.04
64.9
Reasoning RM + Self-Rubric
Backbone=Tulu3-8B-SFT
2026.04
64.2
Base Model
Backbone=Tulu3-8B-SFT
2026.04
56.1
Feedback
Search any
task
Search any
task