Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling on Overall Performance (test)
Loading...
80.2
Overall
CE-RM-4B
55.552
61.951
68.35
74.749
Jan 28, 2026
Overall
Updated 4d ago
Evaluation Results
Method
Method
Links
Overall
CE-RM-4B
RM Type=Pointwise, Sca...
2026.01
80.2
CE-RM-4B
RM Type=Pointwise, Sca...
2026.01
79.2
CE-RM-4B
RM Type=Pointwise, Sca...
2026.01
77.4
TIR-Judge-Zero-8B
RM Type=Pointwise
2026.01
73.8
TIR-Judge-Distill-8B
RM Type=Pointwise
2026.01
73.7
TIR-Judge-Zero-4B
RM Type=Pointwise
2026.01
71.7
Gemini-2.5-Flash
RM Type=Pointwise
2026.01
71.2
TIR-Judge-Distill-4B
RM Type=Pointwise
2026.01
69.7
CompassJudger1-32B
RM Type=Pointwise
2026.01
56.5
Feedback
Search any
task
Search any
task