Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

About

The integration of Large Language Model (LLM) agents is transforming recommender systems from simple query-item matching towards deeply personalized and interactive recommendations. Reinforcement Learning (RL) provides an essential framework for the optimization of these agents in recommendation tasks. However, current methodologies remain limited by a reliance on single dimensional outcome-based rewards that focus exclusively on final user interactions, overlooking critical intermediate capabilities, such as instruction following and complex intent understanding. Despite the necessity for designing multi-dimensional reward, the field lacks a standardized benchmark to facilitate this development. To bridge this gap, we introduce RecRM-Bench, the largest and most comprehensive benchmark to date for agentic recommender systems. It comprises over 1 million structured entries across four core evaluation dimensions: instruction following, factual consistency, query-item relevance, and fine-grained user behavior prediction. By supporting comprehensive assessment from syntactic compliance to complex intent grounding and preference modeling, RecRM-Bench provides a foundational dataset for training sophisticated reward models. Furthermore, we propose a systematic framework for the construction of multi-dimensional reward models and the integration of a hybrid reward function, establishing a robust foundation for developing reliable and highly capable agentic recommender systems. The complete RecRM-Bench dataset is publicly available at https://huggingface.co/datasets/wwzeng/RecRM-Bench.

Wenwen Zeng, Jinhui Zhang, Hao Chen, Zhaoyu Hu, Yongqi Liang, Jiajun Chai, Dengcan Liu, Zhenfeng Liu, Shurui Yan, Minglong Xue, Xiaohan Wang, Wei Lin, Guojun Yin• 2026

Related benchmarks

TaskDatasetResultRank
Behavior PredictionRecRM-Bench
Accuracy77.78
8
Factual ConsistencyRecRM-Bench
Accuracy (%)70.71
8
Instruction FollowingRecRM-Bench
Accuracy72.66
8
Item RankingRecRM-Bench
Accuracy86.78
8
Query-Item RelevanceRecRM-Bench
Accuracy89.36
8
Showing 5 of 5 rows

Other info

Follow for update