Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

About

Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correctness signals which have shown strong potential in training LLMs. In this paper, we propose agentic reward modeling, a reward system that combines reward models with verifiable correctness signals from different aspects to provide reliable rewards. We empirically implement a reward agent, named RewardAgent, that combines human preference rewards with two verifiable signals: factuality and instruction following, to provide more reliable rewards. We conduct comprehensive experiments on existing reward model benchmarks and inference time best-of-n searches on real-world downstream tasks. RewardAgent significantly outperforms vanilla reward models, demonstrating its effectiveness. We further construct training preference pairs using RewardAgent and train an LLM with the DPO objective, achieving superior performance on various NLP benchmarks compared to conventional reward models. Our codes are publicly released to facilitate further research (https://github.com/THU-KEG/Agentic-Reward-Modeling).

Hao Peng, Yunjia Qi, Xiaozhi Wang, Zijun Yao, Bin Xu, Lei Hou, Juanzi Li• 2025

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	Accuracy59.5	881
Instruction Following	IFEval	--	836
Multi-turn Dialogue Evaluation	MT-Bench	Overall Score6.1	532
Multitask Language Understanding	MMLU-Pro	Accuracy31.3	248
Question Answering	TriviaQA	Accuracy55.3	238
Truthfulness Evaluation	TruthfulQA	Accuracy48.5	108
Reward Modeling	RM-Bench Chat Hard	Accuracy60.2	34
Reward Modeling	RM-Bench Chat subset Normal	Accuracy86	16
Reward Modeling	IFBench Normal	Accuracy80.5	16
Reward Modeling	IFBench Hard	Accuracy78	16

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord