PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

About

Generative Recommendation has emerged as a promising paradigm, reformulating recommendation as a sequence-to-sequence generation task over hierarchical Semantic IDs. However, existing methods suffer from a critical issue we term Semantic Drift, where errors in early, high-level tokens irreversibly divert the generation trajectory into irrelevant semantic subspaces. Inspired by Process Reward Models (PRMs) that enhance reasoning in Large Language Models, we propose Promise, a novel framework that integrates dense, step-by-step verification into generative models. Promise features a lightweight PRM to assess the quality of intermediate inference steps, coupled with a PRM-guided Beam Search strategy that leverages dense feedback to dynamically prune erroneous branches. Crucially, our approach unlocks Test-Time Scaling Laws for recommender systems: by increasing inference compute, smaller models can match or surpass larger models. Extensive offline experiments and online A/B tests on a large-scale platform demonstrate that Promise effectively mitigates Semantic Drift, significantly improving recommendation accuracy while enabling efficient deployment.

Chengcheng Guo, Kuo Cai, Yu Zhou, Qiang Luo, Ruiming Tang, Han Li, Kun Gai, Guorui Zhou• 2026

Related benchmarks

Task	Dataset	Result
Sequential Recommendation	Amazon Beauty (test)	NDCG@104.37	170
Recommendation	Kuaishou industrial-scale	Recall@10016.09	24
Recommendation	Kuaishou Online A/B (test)	Total App Usage Time Lift (%)0.121	2
Recommendation	Kuaishou Lite Online A/B (test)	Total App Usage Time0.0013	1

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord