Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Modeling on Blessing
Loading...
99.4
Agreement Rate
AC-GenRM
62.896
72.373
81.85
91.327
Apr 7, 2026
Agreement Rate
Updated 11d ago
Evaluation Results
Method
Method
Links
Agreement Rate
AC-GenRM
Backbone=Qwen3-8B
2026.04
99.4
AC-GenRM
Backbone=Qwen3-4B
2026.04
99.3
AC-GenRM
Backbone=Qwen3-1.7B
2026.04
99.1
AC-GenRM
Backbone=Qwen3-0.6B
2026.04
98.9
DeepSeek-R1
Category=Baselines
2026.04
97.7
GPT-4.1-Mini
Category=Baselines
2026.04
96.9
Claude-3.5-Haiku
Category=Baselines
2026.04
94.9
Base
Backbone=Qwen3-8B
2026.04
94.4
Base
Backbone=Qwen3-4B
2026.04
93.8
GPT-4.1
Category=Baselines
2026.04
92.3
DeepSeek-V3
Category=Baselines
2026.04
90.6
Claude-Sonnet-3.7
Category=Baselines
2026.04
90.2
Base
Backbone=Qwen3-1.7B
2026.04
84.9
Base
Backbone=Qwen3-0.6B
2026.04
64.3
Feedback
Search any
task
Search any
task