Auto-Rubric: Learning From Implicit Weights to Explicit Rubrics for Reward Modeling

About

Conventional reward modeling relies on gradient descent over neural weights, creating opaque, data-hungry "black boxes." We propose a paradigm shift from implicit to explicit reward parameterization, recasting optimization from continuous weight spaces to the discrete space of natural language rubrics. We introduce a training-free framework based on iterative rubric learning: it locally induces discriminative criteria via verification-driven refinement, and globally compresses the candidate criteria pool into a compact core set by maximizing an information-theoretic coding rate objective. We organize the compressed core set into a hierarchical rubric structure -- high-level evaluation dimensions supported by concrete verification checks -- serving as an interpretable, portable reward function. Empirically, our approach challenges prevailing data scaling assumptions: using only 70 preference pairs, our rubric-guided judges outperform fully trained reward models on diverse benchmarks. For instance, Qwen3-8B equipped with our learned rubrics achieves 80.91% on RewardBench2, surpassing the specialized Skywork-Reward-V2-Qwen3-8B (78.20%). These results demonstrate that alignment signals are highly compressible and can be effectively captured through explicit symbolic search.

Lipeng Xie, Sen Huang, Zhuo Zhang, Anni Zou, Yunpeng Zhai, Dingchao Ren, Kezun Zhang, Haoyuan Hu, Boyin Liu, Haoran Chen, Zhaoyang Liu, Bolin Ding• 2025

Related benchmarks

Task	Dataset	Result
Reward Modeling	RM-Bench	--	137
Reward Modeling	JudgeBench	Accuracy74.3	117
Preference Prediction	Arena-Expert-5K, HelpSteer3, HH-RLHF, and UltraFeedback (held-out)	Accuracy70.5	42
Preference Prediction	MMRB2 out-of-domain	EvalMuse Score54.1	22
Reward Modeling	PPE Preference ZH	Accuracy78	19
Reward Modeling	JudgeBench	Reward Modeling Score80.9	16
Reward Modeling	RewardBench2	Score82.3	15
Preference Reconstruction	Consequences	Preference Accuracy56.77	14
Preference Reconstruction	LiTBench Long Stories	Preference Accuracy63.21	14
Preference Reconstruction	Alternate Uses of Objects	Preference Accuracy59.22	14

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord