SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation
About
Reinforcement learning-based preference optimization is increasingly used to align list-wise generative recommenders with complex, multi-objective user feedback, yet existing optimizers such as Gradient-Bounded Policy Optimization (GBPO) exhibit structural limitations in recommendation settings. We identify a Symmetric Conservatism failure mode in which symmetric update bounds suppress learning from rare positive signals (e.g., cold-start items), static negative-sample constraints fail to prevent diversity collapse under rejection-dominated feedback, and group-normalized multi-objective rewards lead to low-resolution training signals. To address these issues, we propose SAGE (Sequence-level Adaptive Gradient Evolution), a unified optimizer designed for list-wise generative recommendation. SAGE introduces sequence-level signal alignment via a geometric-mean importance ratio and a decoupled multi-objective advantage estimator to reduce token-level variance and mitigate reward collapse, together with asymmetric adaptive bounding that applies positive Boost updates to successful slates and an entropy-aware penalty to discourage low-diversity failures. Experiments on Amazon Product Reviews and the large-scale RecIF-Bench demonstrate consistent improvements in top-K accuracy, cold-start recall, and diversity across both Semantic-ID and native-text action spaces, while preserving numerical stability during training. These results suggest that asymmetric, sequence-aware policy optimization provides a principled and effective framework for addressing optimization failures in generative recommendation.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Ad Recommendation | RecIF-Bench Ad Rec | Pass@10.0273 | 20 | |
| Label-Conditional Recommendation | RecIF-Bench Label-Cond. Rec | Pass@320.0574 | 20 | |
| Product Recommendation | RecIF-Bench Product Rec | Pass@12.31 | 20 | |
| Short Video Recommendation | RecIF-Bench Short Video Rec | Pass@15.74 | 20 | |
| Interactive Recommendation | RecIF-Bench Interactive Rec | Pass@113.1 | 11 | |
| Label Prediction | RecIF-Bench | AUC0.7017 | 9 | |
| Sequential Recommendation | Amazon Product Reviews Beauty | Recall@56.83 | 9 | |
| Sequential Recommendation | Amazon Product Reviews Sports | R@53.85 | 9 | |
| Sequential Recommendation | Amazon Product Reviews Toys | R@57.38 | 9 | |
| Generative Recommendation | Amazon Product Reviews Beauty (test) | Entropy@102.551 | 3 |