PROMISE: Process Reward Models Unlock Test-Time Scaling Laws in Generative Recommendations
About
Generative Recommendation has emerged as a promising paradigm, reformulating recommendation as a sequence-to-sequence generation task over hierarchical Semantic IDs. However, existing methods suffer from a critical issue we term Semantic Drift, where errors in early, high-level tokens irreversibly divert the generation trajectory into irrelevant semantic subspaces. Inspired by Process Reward Models (PRMs) that enhance reasoning in Large Language Models, we propose Promise, a novel framework that integrates dense, step-by-step verification into generative models. Promise features a lightweight PRM to assess the quality of intermediate inference steps, coupled with a PRM-guided Beam Search strategy that leverages dense feedback to dynamically prune erroneous branches. Crucially, our approach unlocks Test-Time Scaling Laws for recommender systems: by increasing inference compute, smaller models can match or surpass larger models. Extensive offline experiments and online A/B tests on a large-scale platform demonstrate that Promise effectively mitigates Semantic Drift, significantly improving recommendation accuracy while enabling efficient deployment.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Sequential Recommendation | Amazon Beauty (test) | NDCG@104.37 | 107 | |
| Recommendation | Kuaishou industrial-scale | Recall@10016.09 | 24 | |
| Recommendation | Kuaishou Online A/B (test) | Total App Usage Time Lift (%)0.121 | 2 | |
| Recommendation | Kuaishou Lite Online A/B (test) | Total App Usage Time0.0013 | 1 |