Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Preference Prediction on JudgeBench
Loading...
63.9
Positional Consistent Accuracy
Reasoning RM + External-Rubric (32B)
21.052
32.176
43.3
54.424
Apr 15, 2026
Positional Consistent Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Positional Consistent Accuracy
Reasoning RM + External-Rubric (32B)
Backbone=Qwen3-8B
2026.04
63.9
C2
Backbone=Qwen3-8B
2026.04
63.5
Base Model
Backbone=Qwen3-8B
2026.04
60.9
Reasoning RM + Self-Rubric
Backbone=Qwen3-8B
2026.04
60.8
Reasoning RM
Backbone=Qwen3-8B
2026.04
60.1
Reasoning RM + External-Rubric (32B)
Backbone=Tulu3-8B-SFT
2026.04
59.2
C2
Backbone=Tulu3-8B-SFT
2026.04
39.8
Reasoning RM
Backbone=Tulu3-8B-SFT
2026.04
35.8
Reasoning RM + Self-Rubric
Backbone=Tulu3-8B-SFT
2026.04
35.2
Base Model
Backbone=Tulu3-8B-SFT
2026.04
22.7
Feedback
Search any
task
Search any
task