Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations

About

Generative Recommendation has emerged as a promising paradigm, reformulating recommendation as a sequence-to-sequence generation task over hierarchical Semantic IDs. However, existing methods suffer from a critical issue we term Semantic Drift, where errors in early, high-level tokens irreversibly divert the generation trajectory into irrelevant semantic subspaces. Inspired by Process Reward Models (PRMs) that enhance reasoning in Large Language Models, we propose Promise, a novel framework that integrates dense, step-by-step verification into generative models. Promise features a lightweight PRM to assess the quality of intermediate inference steps, coupled with a PRM-guided Beam Search strategy that leverages dense feedback to dynamically prune erroneous branches. Crucially, our approach unlocks Test-Time Scaling Laws for recommender systems: by increasing inference compute, smaller models can match or surpass larger models. Extensive offline experiments and online A/B tests on a large-scale platform demonstrate that Promise effectively mitigates Semantic Drift, significantly improving recommendation accuracy while enabling efficient deployment.

Chengcheng Guo, Kuo Cai, Yu Zhou, Qiang Luo, Ruiming Tang, Han Li, Kun Gai, Guorui Zhou• 2026

Related benchmarks

TaskDatasetResultRank
Sequential RecommendationAmazon Beauty (test)
NDCG@104.37
107
RecommendationKuaishou industrial-scale
Recall@10016.09
24
RecommendationKuaishou Online A/B (test)
Total App Usage Time Lift (%)0.121
2
RecommendationKuaishou Lite Online A/B (test)
Total App Usage Time0.0013
1
Showing 4 of 4 rows

Other info

Follow for update